Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Use the HuggingFace llama Tokenizer #35

Closed
@setzer22

Description

The tokenizers crate by HuggingFace should give us a more correct tokenizer implementation than the one we're currently using.

Looks like a LLaMA implementation already landed there huggingface/transformers#21955, and then @Narsil shared an additional PR on the tokenizers crate (not sure what this fixes, but I assume the changes are necessary?) huggingface/tokenizers#1183

Seems like we have everything we need to use the new tokenizer. An important point remains though: Are we allowed to distribute the tokenizer file? Can it be considered a completely independent thing from the weights?

Metadata

Assignees

No one assigned

    Labels

    issue:enhancementNew feature or requestmeta:maintenanceChanges that will make it easier for us to maintain code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions