Add Command R and Command R+ models #2548

andyt-cohere · 2024-04-03T18:44:42Z

To do:

PR description
Deciding what to keep / delete for old Cohere content
Impact of chat limitations (see notes in cohere_client.py)
Testing on supported evals?

Tested basic setup via: helm-run --run-entries "mmlu:subject=anatomy,model_deployment=cohere/command-r" --suite v1 --max- eval-instances 1 --disable-cache, and likewise for command-r-plus

andyt-cohere · 2024-04-03T18:46:45Z

prod_env/credentials.conf

@@ -0,0 +1 @@
+cohereApiKey: TODO


Change this to ensure things work

yifanmai

Awesome, thanks for this pull request! I left some high level suggestions.

yifanmai · 2024-04-03T19:57:07Z

src/helm/clients/cohere_client.py

                """
                # Cohere has a Python SDK, but it requires additional post-processing to convert their response
                # objects (`Generations`) to JSON, the form the cache expects the responses to be in.
                response = requests.request(
                    method="POST",
-                    url=get_cohere_url(CohereClient.GENERATE_ENDPOINT),
+                    url=get_cohere_url(CohereClient.CHAT_ENDPOINT),


I'd suggest splitting this into CohereTextClient and CohereChatClient (which can inherit from a common CohereAbstractClient subclass for common logic), then Command R can use CohereChatClient while the other models can keep using CohereTextClient.

Good point! Although I think going forward we will only be supporting chat and not generate for Cohere models.

Do you think I should still go with the split approach?

p.s. is there any way to check what I would break if I removed the no-longer-supported generate code? do you know if it is possible to delete some old Cohere stuff and move forward with the new paradigm?

I am curious if we could even remove some old Cohere stuff.

In the latest version of this PR, I explored some deletions we can make, such as tokenizer removals. Let me know what you think!

Regarding checking what breaks - you should be able to run e.g. cohere/command-light and that should still work.

I'm inclined to keep the old text API endpoint support, so that people can still use them for reproducing old results (at least until the text API endpoint is gone).

As for the tokenizer, can the command-r tokenizer be used for the older command models, or do those use a different tokenizer? Also, is there still a (small) length limit on the tokenizer API? If either of those assumptions are no longer true, it would simplify a lot of HELM code here.

yifanmai · 2024-04-03T19:59:12Z

src/helm/tokenizers/cohere_tokenizer.py

-    # However, even sending a request with 60,000 characters sometimes fails, so we set the
-    # maximum length to 50,000, which is about 8,333 tokens.
-    # TODO: followed up with Cohere support with an example of a failure case
-    TOKENIZE_API_MAX_TEXT_LENGTH: int = 50_000


Does this limit still exist?

Not any more!

yifanmai · 2024-04-03T20:01:32Z

src/helm/tokenizers/cohere_tokenizer.py

-            1 <= len(text) <= CohereTokenizer.TOKENIZE_API_MAX_TEXT_LENGTH
-        ), f"Invalid text length: {len(text)}. Valid length: [1..{CohereTokenizer.TOKENIZE_API_MAX_TEXT_LENGTH:,d}]"
+        assert (1 <= len(text)), f"Invalid empty text with length: {len(text)}. Valid length: >= 1"
+        request["model"] = "command-r" ### To do


You can read this from the request (and also, avoid mutating the input parameter):

if request["tokenizer"] == "cohere/command-r": request = {"model: "command-r", **request}

yifanmai · 2024-04-03T20:02:26Z

src/helm/tokenizers/cohere_tokenizer.py

-    # now accept a num_generations argument in the JSON and will always return an array of generations"
-    # Note that the API version is decoupled from the model version.
-    DEFAULT_API_VERSION: str = "2021-11-08"
-
    TOKENIZE_ENDPOINT: str = "tokenize"


You could consider using the Hugging Face tokenizer version of the Command R tokenizer to avoid using the API, like so.

Much cleaner, thank you!

yifanmai · 2024-04-03T20:04:20Z

prod_env/model_deployments.yaml

+    max_sequence_length: 2048
+    client_spec:
+      class_name: "helm.clients.cohere_client.CohereClient"
+    window_service_spec:


Related to later note about tokenizers: if you use the Hugging Face tokenizer rather than the Cohere tokenizer API, then you can just delete window_service_spec (which exists to work around the length limit of the Cohere tokenizer API), which will make HELM use the default window service.

andyt-cohere · 2024-04-04T17:46:13Z

Thank you! Wonderful advice

andyt-cohere · 2024-04-04T19:38:12Z

@yifanmai thank you so much for the advice.

Let me know if you can take another look at things! I think we may need some thought on what to preserve / what to delete

Also curious what the consequence is of:

Not supporting multiple completions (we could perhaps write a for loop around do_it? unclear if that would violate any designs with parallelism or perhaps cause timeouts)
Not supporting LLHs in responses + responses not explicitly returning tokens (would need to be re-tokenized?)

i.e. in src/helm/clients/cohere_client.py

In total, I wonder if these limitations make it so that most of HELM evals are not possible? Or can we go far with just generative evals?

yifanmai

FYI: I will be traveling until 2024-04-10 (next Wednesday), so I might not respond to your comments until then.

It's okay to (1) only support single completions and (2) not have tokens in the results (you would set the tokens: List[Token] to an empty array [] in RequestResult. Your current implementation looks fine and should you to run all the scenarios in HELM Lite which is our "flagship" leaderboard.

You won't be able to run some other scenarios that require token logprobs (e.g. the language modeling scenarios).

The contract is that models marked as FULL_FUNCTIONALITY_TEXT_MODEL_TAG should implement all these features and can support any scenario, whereas models marked as LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG only need to implement returning text, and are only guaranteed to work on HELM Lite scenarios.

We currently don't do any kind of fine-grained capabilities validation (i.e. checking which scenarios work on which models). We may add that in the future.

yifanmai · 2024-04-05T05:12:37Z

src/helm/clients/cohere_client.py

                """
                # Cohere has a Python SDK, but it requires additional post-processing to convert their response
                # objects (`Generations`) to JSON, the form the cache expects the responses to be in.
                response = requests.request(
                    method="POST",
-                    url=get_cohere_url(CohereClient.GENERATE_ENDPOINT),
+                    url=get_cohere_url(CohereClient.CHAT_ENDPOINT),


Regarding checking what breaks - you should be able to run e.g. cohere/command-light and that should still work.

I'm inclined to keep the old text API endpoint support, so that people can still use them for reproducing old results (at least until the text API endpoint is gone).

As for the tokenizer, can the command-r tokenizer be used for the older command models, or do those use a different tokenizer? Also, is there still a (small) length limit on the tokenizer API? If either of those assumptions are no longer true, it would simplify a lot of HELM code here.

yifanmai · 2024-04-05T05:26:41Z

Also, could you rebase / merge with #2551, which requires renaming Sequence to GeneratedOutput? I will try to minimize these kinds of breaking changes to the Client API in the future.

andyt-cohere · 2024-04-05T14:36:09Z

No problem! Will be sure to rebase. And will make sure we support Command-Light with the mentioned abstractions

p.s. I am OOO next week

andyt-cohere · 2024-04-05T14:38:09Z

There doesn't appear to be a tokenizer length cap anymore (I tried some large lengths), but I would need to use the right tokenizer for Command Light (which is currently not the Command R one).

yifanmai · 2024-04-19T23:45:26Z

Hi @andyt-cohere, would you be okay if I made changes directly on your GitHub fork and branch? This is pretty close to being merge-ready, and I can spend some time next week working on it.

yifanmai · 2024-05-07T21:20:24Z

@andyt-cohere I reverted some of the changes, so now this change only handles adding the Command R model. I'm planning to add the new tokenizer in this pull request #2623, and to do the deletion of the old tokenizers and window services in a another future pull request.

Co-authored-by: Yifan Mai <yifan@cs.stanford.edu>

andyt-cohere added 2 commits April 3, 2024 18:44

Command R

d1bf964

Prod Env

05f78e1

andyt-cohere commented Apr 3, 2024

View reviewed changes

prod_env/credentials.conf Outdated

@@ -0,0 +1 @@

cohereApiKey: TODO

Copy link

Contributor Author

andyt-cohere Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to ensure things work

yifanmai reviewed Apr 3, 2024

View reviewed changes

Upgraded transformers version and used HF tokenizer

a4725fd

andyt-cohere force-pushed the command_r branch from 5eae0ff to a4725fd Compare April 4, 2024 19:04

andyt-cohere added 2 commits April 4, 2024 19:27

Some cleanup to the Cohere client. More documentation and better checks.

a355e4e

Command R plus

607724a

yifanmai reviewed Apr 5, 2024

View reviewed changes

yifanmai added 6 commits May 7, 2024 11:42

Merge branch 'main' into contrib/andyt-cohere/command_r

a2300d8

Remove credentials

2ffb50a

Fixes

84366c4

More changes

422c99e

Revert cohere_tokenizer

9be39c4

Fix duplicate tokenizer

54d2364

yifanmai marked this pull request as ready for review May 7, 2024 21:18

yifanmai requested review from percyliang and farzaank May 7, 2024 21:18

yifanmai self-requested a review May 7, 2024 21:20

yifanmai changed the title ~~Command R~~ Add Command R and Command R+ models May 7, 2024

yifanmai approved these changes May 20, 2024

View reviewed changes

yifanmai merged commit c5c451c into stanford-crfm:main May 20, 2024
6 checks passed

yifanmai deleted the command_r branch May 20, 2024 20:41

yifanmai restored the command_r branch May 20, 2024 20:41

xuwangyin pushed a commit to xuwangyinx/helm that referenced this pull request Jun 23, 2024

Add Command R and Command R+ models (stanford-crfm#2548)

b5b94f9

Co-authored-by: Yifan Mai <yifan@cs.stanford.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Command R and Command R+ models #2548

Add Command R and Command R+ models #2548

andyt-cohere commented Apr 3, 2024 •

edited

Loading

andyt-cohere Apr 3, 2024

yifanmai left a comment

yifanmai Apr 3, 2024

andyt-cohere Apr 4, 2024 •

edited

Loading

andyt-cohere Apr 4, 2024

yifanmai Apr 5, 2024

yifanmai Apr 3, 2024

andyt-cohere Apr 4, 2024

yifanmai Apr 3, 2024

yifanmai Apr 3, 2024

andyt-cohere Apr 4, 2024

yifanmai Apr 3, 2024

andyt-cohere commented Apr 4, 2024

andyt-cohere commented Apr 4, 2024 •

edited

Loading

yifanmai left a comment

yifanmai Apr 5, 2024

yifanmai commented Apr 5, 2024

andyt-cohere commented Apr 5, 2024

andyt-cohere commented Apr 5, 2024

yifanmai commented Apr 19, 2024

yifanmai commented May 7, 2024

Add Command R and Command R+ models #2548

Add Command R and Command R+ models #2548

Conversation

andyt-cohere commented Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

yifanmai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyt-cohere Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyt-cohere commented Apr 4, 2024

andyt-cohere commented Apr 4, 2024 • edited Loading

yifanmai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yifanmai commented Apr 5, 2024

andyt-cohere commented Apr 5, 2024

andyt-cohere commented Apr 5, 2024

yifanmai commented Apr 19, 2024

yifanmai commented May 7, 2024

andyt-cohere commented Apr 3, 2024 •

edited

Loading

andyt-cohere Apr 4, 2024 •

edited

Loading

andyt-cohere commented Apr 4, 2024 •

edited

Loading