This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
Closed
Description
The tokenizers
crate by HuggingFace should give us a more correct tokenizer implementation than the one we're currently using.
Looks like a LLaMA implementation already landed there huggingface/transformers#21955, and then @Narsil shared an additional PR on the tokenizers crate (not sure what this fixes, but I assume the changes are necessary?) huggingface/tokenizers#1183
Seems like we have everything we need to use the new tokenizer. An important point remains though: Are we allowed to distribute the tokenizer file? Can it be considered a completely independent thing from the weights?