Code Generation with Claude Code
You've built a Claude Code–based automated code review agent that uses extended thinking with tool use. The agent calls a run_tests tool, then a fetch_coverage_report tool, then produces a final summary. After deploying, you notice the agent's summaries are often incoherent — it incorrectly describes test failures as passes and ignores relevant coverage gaps. Examining the API request logs, you see that on the second and third tool call turns, the thinking blocks from the previous assistant message are absent from the messages array being sent back to the API. What is the most likely root cause of the incoherent summaries?