GPT Cache

English | 中文

The GPT Cache system is mainly used to cache the question-answer data of users in ChatGPT. This system brings two benefits:

Quick response to user requests: compared to large model inference, searching for data in the caching system will have lower latency, enabling faster response to user requests.
Reduced service costs: currently, most ChatGPT services are charged based on the number of requests. If user requests hit the cache, it can reduce the number of requests and thus lower service costs.

If the idea 💡 is helpful to you, please feel free to give me a star 🌟, which is helpful to me.

🤔 Is Cache necessary?

I believe it is necessary for the following reasons:

Many question-answer pairs in certain domain services based on ChatGPT have a certain similarity.
For a user, there is a certain regularity in the series of questions raised using ChatGPT, which is related to their occupation, lifestyle, personality, etc. For example, the likelihood of a programmer using ChatGPT services is largely related to their work.
If your ChatGPT service targets a large user group, categorizing them can increase the probability of relevant questions being cached, thus reducing service costs.

😊 Quickly Start

alpha test package install

Note: You can quickly experience the cache, it is worth noting that maybe this is not very stable.

pip install -i https://test.pypi.org/simple/ gpt-cache==0.0.2

If you just want to achieve precise matching cache of requests, that is, two identical requests, you ONLY need TWO steps to access this cache

Cache init

from gpt_cache.core import cache

cache.init()
# If you use the `openai.api_key = xxx` to set the api key, you need use `cache.set_openai_key()` to replace it
cache.set_openai_key()

Replace the original openai package

from gpt_cache.view import openai

# openai requests DON'T need ANY changes
answer = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "foo"}
    ],
)

Run locally, if you want better results, you can use the example Sqlite + Faiss + Towhee. Among them, Sqlite + Faiss is used for cache data management, and Towhee is used for embedding operations.

In actual production, or in a certain user group, it is necessary to consider the vector search part more, you can get to know Milvus，or Zilliz Cloud, which allows you to quickly experience Milvus vector retrieval.

More Docs：

🥳 Feature

Support the openai chat completion normal and stream request
Get top_k similar search results, it can be set when creating the data manager
Support the cache chain, see: Cache#next_cache

bak_cache = Cache()
bak_cache.init()
cache.init(next_cache=bak_cache)

Whether to completely skip the current cache, that is, do not search the cache or save the Chat GPT results, see: Cache#cache_enable_func
In the cache initialization phase, no cache search is performed, but save the result returned by the chat gpt to cache, see: cache_skip=True in create request

openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=mock_messages,
    cache_skip=True,
)

Like Lego bricks, custom assemble all modules, including:
- pre-embedding function, get feature information in the original request, such as prompt, last message, etc.
- embedding function, convert feature information into a vector for cache search, choose a model that fits your use case
- data manager, cache data management, mainly dealing with the search and storage of cache data
- cache similarity evaluation function, can use the distance of similar search or additional selection model to ensure that the answer is more accurate
- post-process the cache answer list, first, random or custom combination

🤗 All Model

Pre-embedding
- get the last message in the request, see: pre_embedding.py#last_content
Embedding
- towhee, english model: paraphrase-albert-small-v2, chinese model: uer/albert-base-chinese-cluecorpussmall
- openai embedding api
- string, nothing change
Data Manager
- scalar store
  - sqlite
- vector store
  - milvus
- vector index
  - faiss
Similarity Evaluation
- the search distance, see: simple.py#pair_evaluation
- towhee, roberta_duplicate, precise comparison of problems to problems mode, only support the 512 token
- string, the cache request and the original request are judged by the exact match of characters
- np, use the linalg.norm
Post Process
- choose the most similar
- choose randomly

😆 Contributing

Want to help build GPT Cache? Check out our contributing documentation.

🙏 Thank

Thanks to my colleagues in the company Zilliz for their inspiration and technical support.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
doc		doc
example		example
gpt_cache		gpt_cache
.gitignore		.gitignore
LICENSE		LICENSE
OWNERS		OWNERS
README-CN.md		README-CN.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT Cache

🤔 Is Cache necessary?

😊 Quickly Start

alpha test package install

🥳 Feature

🤗 All Model

😆 Contributing

🙏 Thank

About

Releases 41

Used by 5.8k

Contributors 43

Languages

License

zilliztech/GPTCache

Folders and files

Latest commit

History

Repository files navigation

GPT Cache

🤔 Is Cache necessary?

😊 Quickly Start

alpha test package install

🥳 Feature

🤗 All Model

😆 Contributing

🙏 Thank

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 41

Used by 5.8k

Contributors 43

Languages