-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix partial unicode characters issue #4837
Conversation
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
May I ask something else? Why check Sometimes it (using |
The fix looks good. I'll test and merge it later.
That's probably a bug. If you could move the eos_token check in those two up that would be helpful. |
Thanks. |
Np. I made some tests with Chinese text and it seems to be working well now. If you notice anything else weird, please feel free to submit a new PR. |
Checklist:
The issue link: #4828 (comment)
Causes of the issue:
Intercepting by token ids causes the tokenizer not to decode properly.
I fixed
generate_reply_HF
as well asexllama
,exllamav2
.The rest loaders looks like it's generated by string interception rather than ids (Or maybe I don't fully understand them.)
About exllama and exllamav2:
It seems to decode all tokens entirely each time, so the new reply will always replace the old one.
It looks fine on both Webui and Api, but it does actually generate the wrong characters.
So I fixed them too.