Provider Prompt Caching + Chengeta AI
Track Anthropic and OpenAI server-side prompt cache savings alongside your application-level cache hits.
Install
pip install anthropic
Example — Anthropic with cache_control injection
import anthropic
from chengeta_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
from chengeta_ai.layers.prompt_cache import PromptCacheLayer
client = anthropic.Anthropic()
manager = CacheManager(backend=InMemoryBackend(), key_builder=CacheKeyBuilder())
layer = PromptCacheLayer(metrics=manager.metrics, min_chars_to_cache=512)
# Long system prompt — auto-cached by Anthropic after first call
SYSTEM = "You are an expert Python architect... " * 20
response = layer.anthropic_create(
client,
model="claude-haiku-4-5",
max_tokens=256,
system=SYSTEM,
messages=[{"role": "user", "content": "Design a thread-safe LRU cache"}],
)
snap = manager.metrics.snapshot()
print(f"Provider cache hits: {snap['provider_cache_hits']}")
print(f"Tokens saved: {snap['estimated_tokens_saved']:,}")
print(f"Cost saved: ${snap['estimated_cost_saved_usd']:.4f}")
Two-level savings: response cache + prompt cache
from chengeta_ai import ResponseCache
response_cache = ResponseCache(manager)
prompt_layer = PromptCacheLayer(metrics=manager.metrics)
messages = [{"role": "user", "content": "Design a thread-safe LRU cache"}]
# Check response cache first (eliminates entire API call on hit)
cached = response_cache.get(messages, model_id="claude-haiku-4-5")
if cached:
return cached # zero cost
# Miss — call API with prompt caching (saves tokens on live call)
response = prompt_layer.anthropic_create(client, model="claude-haiku-4-5",
system=SYSTEM, messages=messages, max_tokens=256)
response_cache.set(messages, response, model_id="claude-haiku-4-5", ttl=3600)
Run the cookbook example
export ANTHROPIC_API_KEY=your_key
uv run python -m cookbook.prompt_cache.agent