Open Source · MIT Licensed · Memory layer for agentic AI

Memory Infrastructure
for Agentic AI

Chengeta AI gives intelligent agents a persistent, high-performance memory layer across frameworks, workflows, and environments — so they recall what they have already done instead of paying to recompute it.

Get Started →View Documentation ★ GitHub

$pip install chengeta-ai

Memory Layers

Storage Backends

13+

Framework Adapters

90%

Cost Reduction

Why Chengeta

Agentic systems are expensive because they are forgetful. Chengeta AI is the layer that makes them remember.

∞

Persistent by Design

Memory survives turns, sessions, and restarts. What an agent learns once is preserved — chengeta means to keep safe.

Context memory →

◇

Semantic Recall

Returns saved answers for semantically similar queries via cosine similarity, with an adaptive auto-tuning threshold.

Semantic layer →

⚡

Microsecond Hits

In-memory LRU hits return in microseconds — orders of magnitude faster than an LLM or vector round-trip.

Backends →

◼

Framework-Agnostic

One memory layer across LangChain, LangGraph, CrewAI, AutoGen, Agno, A2A, OpenAI, Anthropic, Gemini and more.

Adapters →

⬡

Production Backends

In-Memory, Disk, Redis, tiered L1+L2, FAISS, Chroma, Qdrant, Weaviate, Pinecone — pick your scale.

Storage →

⌖

Observable & Safe

Tag-based invalidation, stampede protection, multi-tenant namespacing, Prometheus and OpenTelemetry export.

Observability →

Supported Frameworks

Drop in anywhere — no call signatures change.

LangChainLangGraphAutoGenCrewAIAgnoA2AOpenAIAnthropicGeminiGoogle ADKLlamaIndexOpenAI AgentsClaude Agent

Eight Memory Layers

Each layer preserves one kind of artifact, with serialization tuned to its data.

ResponseCache

LLM output by model + messages + params

EmbeddingCache

Vectors by model + text, stored as bytes

RetrievalCache

Documents by query + retriever + top-k

ContextCache

Conversation turns by session + index

SemanticCache

Answers for cosine-similar queries

AdaptiveSemanticCache

Semantic + auto-tuning threshold

StreamingResponseCache

Buffered stream replay as a generator

PromptCacheLayer

Provider cache_control + savings tracking

Performance Benchmarks

A warm cache turns network round-trips into local reads. Figures are representative of typical workloads.

Operation	Cold (miss)	Warm (hit)	Speed-up
In-memory cache hit	~600 ms (LLM call)	< 0.1 ms	6,000×
Embedding reuse	~80 ms (embed API)	< 0.2 ms	400×
Retrieval recall	~22 ms (vector DB)	< 0.3 ms	70×
Semantic match	~600 ms (LLM call)	~2 ms	300×

Architecture

A request flows through adapter and middleware into the memory layers. On a hit, it returns instantly. On a miss, the real call runs once, the result is preserved, and every future request is served from memory.

Framework Adapter

LangChain · CrewAI · OpenAI …

→

Middleware

wrap any callable

→

Memory Layers

8 cache layers

→

Backend

KV or Vector store

Miss → real API call → preserve in memory → return