Cache Layers
Chengeta AI ships eight purpose-built cache layers. Each targets a distinct stage of the AI pipeline with optimized serialization and a consistent get / set / get_or_* interface.
At a Glance
| Layer | Class | What it caches | Key |
|---|---|---|---|
| Response | ResponseCache | LLM completions | model + messages + params |
| Streaming | StreamingResponseCache | Streaming LLM chunks (buffered) | model + messages + params |
| Embedding | EmbeddingCache | np.ndarray vectors | text + model |
| Retrieval | RetrievalCache | Document lists | query + retriever + top_k |
| Context | ContextCache | Conversation history | session ID + turn index |
| Semantic | SemanticCache | Any value by meaning (exact + cosine) | exact key or vector similarity |
| Adaptive Semantic | AdaptiveSemanticCache | Semantic cache with auto-tuning threshold | same as SemanticCache |
| Prompt Cache | PromptCacheLayer | Provider cache_control injection + savings | — (wraps API calls) |
Pipeline Diagram
Shared Design Principles
- Constructor takes a
CacheManager— exceptSemanticCache/AdaptiveSemanticCachewhich wire their own backends. get(...)returnsNoneon miss — never raises.set(...)accepts optionalttl— falls back toTTLPolicywhen omitted.get_or_*convenience methods — compute + cache in one call.- Pluggable serializer — all layers accept
serializer=param.
Next Steps
- ResponseCache — cache LLM completions
- StreamingResponseCache — cache streaming LLM output
- SemanticCache — meaning-aware caching
- AdaptiveSemanticCache — auto-tuning threshold
- PromptCacheLayer — provider prompt cache integration
- EmbeddingCache — cache dense vectors
- RetrievalCache — cache retriever results
- ContextCache — cache conversation history