Skip to main content

Cache Layers

Chengeta AI ships eight purpose-built cache layers. Each targets a distinct stage of the AI pipeline with optimized serialization and a consistent get / set / get_or_* interface.


At a Glance

LayerClassWhat it cachesKey
ResponseResponseCacheLLM completionsmodel + messages + params
StreamingStreamingResponseCacheStreaming LLM chunks (buffered)model + messages + params
EmbeddingEmbeddingCachenp.ndarray vectorstext + model
RetrievalRetrievalCacheDocument listsquery + retriever + top_k
ContextContextCacheConversation historysession ID + turn index
SemanticSemanticCacheAny value by meaning (exact + cosine)exact key or vector similarity
Adaptive SemanticAdaptiveSemanticCacheSemantic cache with auto-tuning thresholdsame as SemanticCache
Prompt CachePromptCacheLayerProvider cache_control injection + savings— (wraps API calls)

Pipeline Diagram


Shared Design Principles

  1. Constructor takes a CacheManager — except SemanticCache/AdaptiveSemanticCache which wire their own backends.
  2. get(...) returns None on miss — never raises.
  3. set(...) accepts optional ttl — falls back to TTLPolicy when omitted.
  4. get_or_* convenience methods — compute + cache in one call.
  5. Pluggable serializer — all layers accept serializer= param.

Next Steps