Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump openvino_tokenizers version #1333

Merged
merged 3 commits into from
Dec 6, 2024

Conversation

pavel-esir
Copy link
Contributor

No description provided.

@pavel-esir pavel-esir added the category: tokenizers Tokenizer class or submodule update label Dec 6, 2024
@pavel-esir pavel-esir added this to the 2025.0 milestone Dec 6, 2024
@ilya-lavrenov ilya-lavrenov self-assigned this Dec 6, 2024
@slyalin slyalin self-requested a review December 6, 2024 11:01
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) category: GHA CI based on Github actions labels Dec 6, 2024
@pavel-esir
Copy link
Contributor Author

Please, address your comment from another PR https://github.com/openvinotoolkit/openvino.genai/pull/1246/files#r1853668649

done

@pavel-esir
Copy link
Contributor Author

Difference between HF chat_sample and our chat_sample arises from the fact that in tests we used LlamaTokenzier which is different from tokenzer used by default with AutoTokenzier. LlamaTokenzier gives slightly different results that AutoTokenzier, and openvino_tokenizers is alighned with the AutoTokenizers.

When openvino_tokenizers fixed and issue and aligned to AutoTokenzier/LlamaTokenizerFast then 29871 tokens appeared which is missing in LlamaTokenzier and precommit tests started to differ.

image

@apaniukov please confirm that analysis is correct

@pavel-esir pavel-esir requested a review from apaniukov December 6, 2024 16:24
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Dec 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 6, 2024
@ilya-lavrenov ilya-lavrenov merged commit ee91fcf into openvinotoolkit:master Dec 6, 2024
54 checks passed
@pavel-esir pavel-esir deleted the bum_ov_tok branch December 7, 2024 13:01
sungeunk pushed a commit to sungeunk/openvino.genai that referenced this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GHA CI based on Github actions category: LLM LLM pipeline (stateful, static) category: tokenizers Tokenizer class or submodule update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants