Q54 of 60 — CCA Practice Exam

Structured Data Extraction

A structured data extraction pipeline processes 150 regulatory filings per night, each averaging 35,000 tokens. Engineers report that Claude consistently extracts accurate data from the first few sections of each filing but misses or misattributes key values — such as penalty amounts and effective dates — that appear in dense boilerplate near the middle of the document. The current prompt places the extraction instructions and field definitions first, followed by the full filing text. What change would most directly improve extraction accuracy for fields buried in the middle of long documents?

A. Reorder the prompt so the full filing text appears before the extraction instructions and field definitions, and require Claude to quote the relevant passages before populating each field.

B. Split each 35,000-token filing into overlapping 8,000-token chunks and run a separate extraction call per chunk, then merge the results downstream.

C. Add a stronger system prompt instruction emphasizing that Claude must read the entire document carefully before extracting any values, particularly those in the middle sections.

D. Increase max_tokens to give Claude more room to reason through the full filing before committing to extracted field values.