Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context

By Sigma Griffin · April 7, 2026 · 1 min read

Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context If you're sending the same large context (system prompt, documents, tool definitions) with every API call, you're paying full price each time. Prompt caching stores that context once and charges 10x less on subsequent calls. How It Works Normal API call: full input tokens billed every request. With caching: First call: input tokens billed + small cache write fee Subsequent calls: only a cache read fee (90% cheaper than full tokens) Marking Content for Caching import anthropic client = anthropic.Anthropic() # Large document you send on every request SYSTEM_DOCS = open('large_codebase_context.txt').read() # 50k tokens response = client.messages.create( model='claude-opus-4-6', max_tokens=1024, system=[ { 'type': 'text', 'text': SYSTEM_DOCS, 'cache_control': {'type': 'ephemeral'} # Cache this } ], messages=[{'role': 'user', 'content': 'What does the auth module do?'}] ) Checking Cache Usage usage = response.usage print(f

Prompt Caching with Claude: Cut API Costs by 90% on Repeated Context

Related Posts

Trending on ShareHub

Latest on ShareHub

Browse Topics

Around the Network