-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Command R and Command R+ models #2548
Conversation
prod_env/credentials.conf
Outdated
@@ -0,0 +1 @@ | |||
cohereApiKey: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to ensure things work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for this pull request! I left some high level suggestions.
src/helm/clients/cohere_client.py
Outdated
""" | ||
# Cohere has a Python SDK, but it requires additional post-processing to convert their response | ||
# objects (`Generations`) to JSON, the form the cache expects the responses to be in. | ||
response = requests.request( | ||
method="POST", | ||
url=get_cohere_url(CohereClient.GENERATE_ENDPOINT), | ||
url=get_cohere_url(CohereClient.CHAT_ENDPOINT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest splitting this into CohereTextClient
and CohereChatClient
(which can inherit from a common CohereAbstractClient
subclass for common logic), then Command R can use CohereChatClient
while the other models can keep using CohereTextClient
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Although I think going forward we will only be supporting chat
and not generate
for Cohere models.
Do you think I should still go with the split approach?
p.s. is there any way to check what I would break if I removed the no-longer-supported generate code? do you know if it is possible to delete some old Cohere stuff and move forward with the new paradigm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious if we could even remove some old Cohere stuff.
In the latest version of this PR, I explored some deletions we can make, such as tokenizer removals. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding checking what breaks - you should be able to run e.g. cohere/command-light
and that should still work.
I'm inclined to keep the old text API endpoint support, so that people can still use them for reproducing old results (at least until the text API endpoint is gone).
As for the tokenizer, can the command-r tokenizer be used for the older command models, or do those use a different tokenizer? Also, is there still a (small) length limit on the tokenizer API? If either of those assumptions are no longer true, it would simplify a lot of HELM code here.
# However, even sending a request with 60,000 characters sometimes fails, so we set the | ||
# maximum length to 50,000, which is about 8,333 tokens. | ||
# TODO: followed up with Cohere support with an example of a failure case | ||
TOKENIZE_API_MAX_TEXT_LENGTH: int = 50_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this limit still exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not any more!
1 <= len(text) <= CohereTokenizer.TOKENIZE_API_MAX_TEXT_LENGTH | ||
), f"Invalid text length: {len(text)}. Valid length: [1..{CohereTokenizer.TOKENIZE_API_MAX_TEXT_LENGTH:,d}]" | ||
assert (1 <= len(text)), f"Invalid empty text with length: {len(text)}. Valid length: >= 1" | ||
request["model"] = "command-r" ### To do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can read this from the request (and also, avoid mutating the input parameter):
if request["tokenizer"] == "cohere/command-r":
request = {"model: "command-r", **request}
# now accept a num_generations argument in the JSON and will always return an array of generations" | ||
# Note that the API version is decoupled from the model version. | ||
DEFAULT_API_VERSION: str = "2021-11-08" | ||
|
||
TOKENIZE_ENDPOINT: str = "tokenize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider using the Hugging Face tokenizer version of the Command R tokenizer to avoid using the API, like so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner, thank you!
prod_env/model_deployments.yaml
Outdated
max_sequence_length: 2048 | ||
client_spec: | ||
class_name: "helm.clients.cohere_client.CohereClient" | ||
window_service_spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to later note about tokenizers: if you use the Hugging Face tokenizer rather than the Cohere tokenizer API, then you can just delete window_service_spec
(which exists to work around the length limit of the Cohere tokenizer API), which will make HELM use the default window service.
Thank you! Wonderful advice |
@yifanmai thank you so much for the advice. Let me know if you can take another look at things! I think we may need some thought on what to preserve / what to delete Also curious what the consequence is of:
i.e. in In total, I wonder if these limitations make it so that most of HELM evals are not possible? Or can we go far with just generative evals? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I will be traveling until 2024-04-10 (next Wednesday), so I might not respond to your comments until then.
It's okay to (1) only support single completions and (2) not have tokens in the results (you would set the tokens: List[Token]
to an empty array []
in RequestResult
. Your current implementation looks fine and should you to run all the scenarios in HELM Lite which is our "flagship" leaderboard.
You won't be able to run some other scenarios that require token logprobs (e.g. the language modeling scenarios).
The contract is that models marked as FULL_FUNCTIONALITY_TEXT_MODEL_TAG
should implement all these features and can support any scenario, whereas models marked as LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG
only need to implement returning text, and are only guaranteed to work on HELM Lite scenarios.
We currently don't do any kind of fine-grained capabilities validation (i.e. checking which scenarios work on which models). We may add that in the future.
src/helm/clients/cohere_client.py
Outdated
""" | ||
# Cohere has a Python SDK, but it requires additional post-processing to convert their response | ||
# objects (`Generations`) to JSON, the form the cache expects the responses to be in. | ||
response = requests.request( | ||
method="POST", | ||
url=get_cohere_url(CohereClient.GENERATE_ENDPOINT), | ||
url=get_cohere_url(CohereClient.CHAT_ENDPOINT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding checking what breaks - you should be able to run e.g. cohere/command-light
and that should still work.
I'm inclined to keep the old text API endpoint support, so that people can still use them for reproducing old results (at least until the text API endpoint is gone).
As for the tokenizer, can the command-r tokenizer be used for the older command models, or do those use a different tokenizer? Also, is there still a (small) length limit on the tokenizer API? If either of those assumptions are no longer true, it would simplify a lot of HELM code here.
Also, could you rebase / merge with #2551, which requires renaming |
No problem! Will be sure to rebase. And will make sure we support Command-Light with the mentioned abstractions p.s. I am OOO next week |
There doesn't appear to be a tokenizer length cap anymore (I tried some large lengths), but I would need to use the right tokenizer for Command Light (which is currently not the Command R one). |
Hi @andyt-cohere, would you be okay if I made changes directly on your GitHub fork and branch? This is pretty close to being merge-ready, and I can spend some time next week working on it. |
@andyt-cohere I reverted some of the changes, so now this change only handles adding the Command R model. I'm planning to add the new tokenizer in this pull request #2623, and to do the deletion of the old tokenizers and window services in a another future pull request. |
Co-authored-by: Yifan Mai <yifan@cs.stanford.edu>
To do:
cohere_client.py
)Tested basic setup via:
helm-run --run-entries "mmlu:subject=anatomy,model_deployment=cohere/command-r" --suite v1 --max- eval-instances 1 --disable-cache
, and likewise forcommand-r-plus