Skip to main content
Open Source · MIT Licensed · Memory layer for agentic AI

Memory Infrastructure
for Agentic AI

Chengeta AI gives intelligent agents a persistent, high-performance memory layer across frameworks, workflows, and environments — so they recall what they have already done instead of paying to recompute it.

Get Started →View Documentation★ GitHub
$pip install chengeta-ai
8
Memory Layers
9
Storage Backends
13+
Framework Adapters
90%
Cost Reduction

Why Chengeta

Agentic systems are expensive because they are forgetful. Chengeta AI is the layer that makes them remember.

Persistent by Design

Memory survives turns, sessions, and restarts. What an agent learns once is preserved — chengeta means to keep safe.

Context memory
Semantic Recall

Returns saved answers for semantically similar queries via cosine similarity, with an adaptive auto-tuning threshold.

Semantic layer
Microsecond Hits

In-memory LRU hits return in microseconds — orders of magnitude faster than an LLM or vector round-trip.

Backends
Framework-Agnostic

One memory layer across LangChain, LangGraph, CrewAI, AutoGen, Agno, A2A, OpenAI, Anthropic, Gemini and more.

Adapters
Production Backends

In-Memory, Disk, Redis, tiered L1+L2, FAISS, Chroma, Qdrant, Weaviate, Pinecone — pick your scale.

Storage
Observable & Safe

Tag-based invalidation, stampede protection, multi-tenant namespacing, Prometheus and OpenTelemetry export.

Observability

Supported Frameworks

Drop in anywhere — no call signatures change.

LangChainLangGraphAutoGenCrewAIAgnoA2AOpenAIAnthropicGeminiGoogle ADKLlamaIndexOpenAI AgentsClaude Agent

Eight Memory Layers

Each layer preserves one kind of artifact, with serialization tuned to its data.

01
ResponseCache
LLM output by model + messages + params
02
EmbeddingCache
Vectors by model + text, stored as bytes
03
RetrievalCache
Documents by query + retriever + top-k
04
ContextCache
Conversation turns by session + index
05
SemanticCache
Answers for cosine-similar queries
06
AdaptiveSemanticCache
Semantic + auto-tuning threshold
07
StreamingResponseCache
Buffered stream replay as a generator
08
PromptCacheLayer
Provider cache_control + savings tracking

Performance Benchmarks

A warm cache turns network round-trips into local reads. Figures are representative of typical workloads.

OperationCold (miss)Warm (hit)Speed-up
In-memory cache hit~600 ms (LLM call)< 0.1 ms6,000×
Embedding reuse~80 ms (embed API)< 0.2 ms400×
Retrieval recall~22 ms (vector DB)< 0.3 ms70×
Semantic match~600 ms (LLM call)~2 ms300×

Architecture

A request flows through adapter and middleware into the memory layers. On a hit, it returns instantly. On a miss, the real call runs once, the result is preserved, and every future request is served from memory.

Framework Adapter
LangChain · CrewAI · OpenAI …
Middleware
wrap any callable
Memory Layers
8 cache layers
Backend
KV or Vector store
Miss → real API call → preserve in memory → return

Join the Community

Chengeta AI is built in the open. Bring your frameworks, your scale, and your ideas.

Star on GitHub

Follow development and shape the roadmap.

Discussions

Ask questions and share patterns.

Contribute

Add a backend, adapter, or recipe.

Read the Docs

Guides, cookbook, and API reference.