-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TokenChunker
does not support multiple inputs
#18
Comments
Hey @not-lain, WOAH 😳 that's a bit embarrasing, haha Regarding adding batching/list support, I plan to add multiprocessing support (via MPIRE) soon, so we can run these parallely~! Multiprocessing because I want Chonkie to be the fastest even with Batching. Would really appreciate PRs if you're willing to work on this. |
On it 🫡 |
Hey @not-lain! We can probably add a method to the And we can expose the How does that sound? |
Hey @not-lain, Just added initial support for batching in the I'd be happy to accept PRs for "native" batching approaches in For now, I think we can close this issue and make different issues for "native" batching support on the various chunkers. Thanks 😊 |
Awesome, was thinking of doing this over the weekend, but glad it was already implented. |
Issue
I ran the following example provided in the readme file
and I was running into the following error
extra information
I would suggest either updating the example on the readme file or updating the
BaseChunker
to support multiple inputs at the same time.the latter is my go-to suggestion since it can process multiple samples at the same time, we can either support lists here or args, preferably lists since the tokenizers library already supports lists already.
The text was updated successfully, but these errors were encountered: