Passing threshold: 75%
AskUserQuestion tool to inject a clarifying prompt at the next natural pause in Claude's execution, then resume the task with the user's updated direction.
canUseTool callback to intercept every tool call and check whether the user wants to change direction before each step.
AskUserQuestion is designed for Claude to proactively solicit clarification from users at decision points it identifies. It does not allow users to proactively interrupt Claude mid-task; it still depends on Claude reaching a pause in execution where it chooses to ask.
/status to see which managed source is currently active.
/permissions to reset the policy merge.
/status, which shows which managed source is currently active.
canUseTool callback state and resume it after the reviewer responds.
defer hook decision, which allows the process to exit and resume later from a persisted session while the callback remains pending.
canUseTool callback that auto-approves the operation after a configurable delay, keeping the process alive and bounded in duration.
defer hook for long-running waits is not available in the Python SDK — only in the TypeScript SDK. Attributing this capability to the Python SDK is a direct contradiction of the documented constraint.
defer hook decision, which lets the process exit and resume later from the persisted session. This is the only SDK-provided mechanism for handling long-lived human-in-the-loop waits at approval checkpoints.
changes.json plan file after analysis, validate it with a script before applying any changes, and surface specific error messages that allow Claude to iterate on the plan.
max_tokens in the API configuration to allow Claude to hold more conversation history before truncation occurs.
/compact at a natural pause point to replace the growing conversation with a structured summary, so startup content reloads and resolution state is preserved efficiently.
/context periodically to monitor usage by category, then manually delete earlier turns from the conversation log to reclaim context space.
/context provides a live breakdown of context usage with optimization suggestions — it is a diagnostic tool, not a remediation mechanism. Manually deleting earlier conversation turns is not a supported Claude Code workflow and could corrupt the session state or remove information Claude still needs for resolution.
/compact is explicitly designed to manage context growth mid-session: it replaces the accumulated conversation with a structured summary, most startup content (like CLAUDE.md and auto memory) reloads automatically, and the session can continue without a restart. This directly addresses context exhaustion while preserving resolution continuity — the core tradeoff in long-running support sessions.
issue_comment events.
/loop and /debug are prompt-based playbooks that let Claude orchestrate the work using its tools, so the pipeline gains Claude's judgment and tool-use capabilities without the team writing custom orchestration logic.
/loop or /debug brings Claude's reasoning and tool-use capabilities to bear on the task, replacing the need for the team to write custom orchestration logic in shell scripts.
output_tokens field undercounts tokens because it only reports tokens from the final model call, not all calls across the session, so the real total is higher than expected.
cache_read_input_tokens field represents tokens served from cache rather than processed fresh, which carry a lower per-token cost and reduce total spend relative to a naive input × price estimate.
input_tokens field includes both cached and uncached tokens in its count, so the effective price should be applied to the full input_tokens value to get the correct cost.
input_tokens reports only uncached input tokens, while cache_read_input_tokens tracks tokens served from the prompt cache. Cache reads carry a reduced per-token cost, which is why total spend falls below a naive calculation that multiplies all input tokens by the full price. The separation of these fields in the usage object is precisely what enables accurate cost tracking by distinguishing cheap cache-read tokens from full-price uncached tokens.
str_replace commands on any file that exceeds 10,000 characters, because edits require the full file to be in context first.
max_characters parameter is only honored by text_editor_20250728 and later versions, so the team must verify their tool version matches the model in use or the parameter will have no effect.
max_characters to a value lower than the file length increases token efficiency for viewing but requires the subagent to request multiple views or work with partial context when reasoning about content beyond the truncation point.
max_characters parameter controls truncation when viewing large files — it does not prevent edits or reject files. Reducing it preserves context window space (the stated goal) but means the subagent only sees a portion of the file per view, introducing a tradeoff: the agent must paginate its views or operate with incomplete file context when the relevant content lies beyond the truncation boundary. This is a genuine design tradeoff the team must account for.
{"INVALID_JSON": "<malformed string>"}), properly escaping any special characters, before passing it back to Claude.
max_tokens limit without returning the malformed output to Claude, since invalid JSON cannot safely be included in any response block.
{"INVALID_JSON": "..."}) when it must be passed back to the model in an error response block. This approach preserves the original malformed data for debugging while keeping the outer response structurally valid — and critically, the wrapper must itself be valid JSON, requiring proper escaping of quotes and special characters inside the string value.
@claude review before each merge, which catches regressions on every push and auto-resolves threads, with the tradeoff that reviews only run when explicitly requested.
@claude review once after each push to trigger the next run.
&& or ; operators; for sequential tool calls issued separately, each call starts a fresh shell regardless of session configuration.
query() callbacks, because it supports in-process logic and its scoping to the main session is sufficient for subagent write protection.
settings.json, accepting that the hook logic must be expressed as a shell command, HTTP endpoint, MCP tool, prompt, or agent — rather than inline in-process code — because filesystem hooks fire in both the main agent and any subagents it spawns.
query() callbacks and replicate it as a filesystem hook, so both the main session and subagents are covered without any tradeoffs.
"prompt" type, because LLM-evaluated prompts are the only hook mechanism capable of reaching subagents spawned during a session.
settings.json) fire in the main agent AND any subagents it spawns, making them the only hook type that can enforce constraints across the full execution tree. The tradeoff is that logic must be expressed as one of the supported command types (command, http, mcp_tool, prompt, or agent) rather than as inline in-process code. When subagent coverage is the critical constraint, the filesystem hook is the correct choice despite losing in-process integration.
agent.message by logging tool invocation details, handle agent.tool_use by displaying text output to the user, and treat session.status_idle as an error condition requiring retry.
agent.message by extracting and displaying text blocks from its content array, handle agent.tool_use by recording the tool name for observability, and break the stream loop upon receiving session.status_idle.
agent.message by extracting and displaying text blocks from its content array, treat agent.tool_use as an error because agents should not invoke tools autonomously, and poll the session endpoint for completion instead of relying on session.status_idle.
agent.message events as intermediate noise, handle agent.tool_use by extracting text output, and break the stream loop only when the stream connection closes rather than on session.status_idle.
agent.message carries content blocks (text is extracted from the content array), agent.tool_use carries a name field indicating which tool the agent is using (suitable for logging/observability), and session.status_idle is the designated signal that the agent has finished — the loop breaks upon receiving it. This three-way distinction is the correct structured event-handling pattern for SSE streams from managed agent sessions.
max_tokens limit per teammate session, disable CLAUDE.md loading for teammates, and run teammates sequentially instead of in parallel.
/usage command to cap spending per teammate, reduce the number of MCP servers registered globally, and switch to a cheaper model tier only for idle teammates.
.claude/agents/ at startup, which is faster than parsing a large system prompt at runtime, reducing overall latency for each phase.