Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context
Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context If you're sending the same large context (system prompt, documents, tool definitions) with every API call, you're paying full pr...

Source: DEV Community
Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context If you're sending the same large context (system prompt, documents, tool definitions) with every API call, you're paying full price each time. Prompt caching stores that context once and charges 10x less on subsequent calls. How It Works Normal API call: full input tokens billed every request. With caching: First call: input tokens billed + small cache write fee Subsequent calls: only a cache read fee (90% cheaper than full tokens) Marking Content for Caching import anthropic client = anthropic.Anthropic() # Large document you send on every request SYSTEM_DOCS = open('large_codebase_context.txt').read() # 50k tokens response = client.messages.create( model='claude-opus-4-6', max_tokens=1024, system=[ { 'type': 'text', 'text': SYSTEM_DOCS, 'cache_control': {'type': 'ephemeral'} # Cache this } ], messages=[{'role': 'user', 'content': 'What does the auth module do?'}] ) Checking Cache Usage usage = response.usage print(f