-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add exact match caching #1717
Conversation
Great to see this draft 🙂 This should give you an average distribution for last month @ayulockin this ignores rows which had the value 1 (assume single evaluations) |
Thanks @jjmachan, I will stress test on an evaluation set of 300 rows (on the top of my mind). I think it would be sufficient amount give the percentile 95 is 120 and average is 40 rows. Will benchmark a few methods and let you know. Super useful data points. |
hey @ayulockin I was thinking, instead of benchmarking evaluation, we have to evaluate prompt inputs here right - that will be more accurate benchmark. We can synthically create a few of these too but a few questions
|
Caching should add some latency. I want to benchmark evaluation/testset generation (as a case study) with caching enabled and without it. The difference should not be noticeable imo. Running this is easy. I am not sure what you exactly mean by "evaluate prompt inputs here". My guess is on unique key generation given a prompt input and that it is deterministic (we get the same key when we give the same prompt input) and the lookup speed/insertion speed, etc. I think with evaluation benchmark I meant the same thing. So to answer your question on "what aspects are we benchmarking/measuring?" > It's latency. Also because we are saving the raw output as the value to the key, need to be careful about the disk memory usage too imo (but should not be such a big deal considering the percentile usage data you shared -- one will never run LLM based eval on 100k samples 😅).
I am generating key like this: Line 81 in 55e7472
it's quite generic but want to check it with the embeddings as well (embedding models are very cheap so up to you if you want this to be cached too.) TLDR;
|
Ran evaluation with and without caching on the rungalileo/ragbench "tech" subset with 314 samples in the test set. Without caching: 6 minutes 32.21 seconds With caching: 5 minutes 46.31 seconds Hence Running the same eval that was cached took only 3.27 seconds. The resulting Thoughts:
|
this actually doesn't hold for testset generation, especially since there are a lot of transforms involved which process a lot of documents
one are that might perform different is testset generation which will behave differently when compared to things that are cached when evaluating. This is why I though we would consider a different benchmark but that is an overkill. Instead what we could do is to benchmark how caching affects the testset generation module too - what do you think?
how users will use these config is the question |
You are right. Test set generation is where I am right now.
In memory should be an option for the users -- maybe test generation benefits from it (just started investigating). The default should be
I think I have a way to do it cleanly. Will let you know. |
The caching works fine with test generation. But I faced a few issues from my dev branch while running test generation so, did a fresh install in a new environment by cloning directly from the I have documented them here #1718. I am either doing something stupid or the issues are real. |
thanks a lot for the update @ayulockin 🙂 should we aim to just work on diskcache for now and then open an issue to get more feedback on this |
@jjmachan the PR is ready for review. |
@jjmachan moved caching to the |
src/ragas/llms/base.py
Outdated
@cacher() | ||
async def generate( | ||
self, | ||
prompt: PromptValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayulockin there are a couple of things here
because cacher
is a decorator users don't have control over this other than the environment variable, so this has the same problem as this being in PydanticPrompt
. In my mind there where 2 usecases
- user has to enable/disable caching
- user has to be able to change the caching service
I'll write done how I was thinking about this
1. enable caching
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, caching=True)
# now you can use the LLM as you want
...
the same applies for embeddings too - this was something missing in this PR.
customize caching
# implement a cache backend from the interface
from ragas.cache import CacheInterface
# maybe with Redis backend
class RedisCacher(CacheInterface):
# implementation
# use with LLMWrapper itself
gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, cacher=RedisCacher())
now one con I see of this approach is that there are 2 keywords we have to add so for the "ennable caching usecase" I was thinking myabe we can join them
from ragas.cache import DefaultCacher
# we can name it DiskCacheCacher too
# use with LLMWrapper itself
gpt4o = ChatOpenAI()
evaluator_llm = LangchainLLMWrapper(gpt4o, cacher=DefaultCacher())
what do you think?
Hey @jjmachan, this is great. Let me quickly implement this.
This is clearly a cleaner approach from a user pov (env can be scary especially in a big project). My approach was to make caching a hidden thing with just the control to turn on/off and select backend but I find your idea more appealing. |
I went with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 🥳
This PR introduces a robust caching abstraction to improve performance and prevent progress loss during API calls (e.g., LLM or embedding requests). This was brought up in explodinggradients#1522 and later added as an enhancement request in explodinggradients#1602.
This PR introduces a robust caching abstraction to improve performance and prevent progress loss during API calls (e.g., LLM or embedding requests). This was brought up in explodinggradients#1522 and later added as an enhancement request in explodinggradients#1602.
This PR introduces a robust caching abstraction to improve performance and prevent progress loss during API calls (e.g., LLM or embedding requests). This was brought up in #1522 and later added as an enhancement request in #1602.
Key Features:
CacheInterface
withDiskCacheBackend
(usingdiskcache
) andInMemoryCacheBackend
(not tested yet).@cacher
decorator to seamlessly apply caching to both synchronous and asynchronous functions. This can act as a singular interface. Additionally, I can create aCacheMixin
class from which other classes can inherit. But personally liked the elegance of a decorator.Questions:
What's would be a good assumption for the size of dataset used for evaluation/number of LLM calls made? Asking this so that I can plan benchmarking accordingly. cc: @jjmachan
The implemented
_generate_cache_key
uses theargs
andkwargs
to find attributes that should be deterministic. For now I am excluding keys that have memory address. This can be done more gracefully maybe. Or we can leverage the__repr__
or__hash__
written forPydanticPrompt
but not sure if this is the consensus for writing most core components in the lib.Few obvious TODOs: