Claude Code for Continuous Integration
Your CI pipeline uses Claude Code to auto-implement utility functions flagged by failing tests. Over the past two weeks, 34% of auto-implemented functions pass Claude's internal review but fail the actual test suite when the PR runs — requiring a developer to manually intervene before merge. Logs show Claude is generating plausible implementations without any concrete pass/fail criteria to evaluate against. What change would most effectively reduce the implementation failure rate?