Skip to main content

OpenAICacheAdapter

Wraps openai.OpenAI and openai.AsyncOpenAI chat.completions.create calls with response caching. Returns cached results for identical (model, messages, params) combinations without hitting the API.

Installation

pip install 'chengeta-ai[openai]'

Usage

Sync

import openai
from chengeta_ai import CacheManager, InMemoryBackend, CacheKeyBuilder
from chengeta_ai.adapters.openai_adapter import OpenAICacheAdapter

client = openai.OpenAI()
manager = CacheManager(
backend=InMemoryBackend(),
key_builder=CacheKeyBuilder(namespace="myapp"),
)
adapter = OpenAICacheAdapter(client, manager)

# First call — hits the API
response = adapter.chat_create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is semantic caching?"}],
)

# Second call with same args — returns from cache instantly
response = adapter.chat_create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is semantic caching?"}],
)

Async

client = openai.AsyncOpenAI()
adapter = OpenAICacheAdapter(client, manager)

response = await adapter.achat_create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is semantic caching?"}],
)

With Redis backend

from chengeta_ai.backends.redis_backend import RedisBackend

manager = CacheManager(
backend=RedisBackend(url="redis://localhost:6379/0"),
key_builder=CacheKeyBuilder(namespace="prod"),
)
adapter = OpenAICacheAdapter(client, manager)

Invalidate by model

adapter.invalidate_model("gpt-4o")  # remove all cached gpt-4o responses

How It Works

Cache key = hash(model + messages + non-stream params). The full ChatCompletion response object is serialised and replayed on hit. The stream parameter is excluded from the key so streaming vs non-streaming calls share the same cache entry.


API Reference

MethodDescription
chat_create(**kwargs)Cached client.chat.completions.create() (sync)
achat_create(**kwargs)Cached client.chat.completions.create() (async)
invalidate_model(model)Invalidate all cached responses for a model