-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement chat continuation #68
Comments
Thanks. I think figuring out a way to get this working will jumpstart further integrations with other projects by making high-speed completions for chat more easily accessible. I don't know the best way for keeping track of a chat session (i.e. matching messages to an existing context). I think I saw previously that you might have some user information, or maybe we could use the completion ID somehow? And just to be clear, I'm not expecting this to scale to a significant number of conversations simultaneously. My idea is just that it gives us the ability to continue generation on an existing context without having to reinitialize it. So maybe a couple... Or even just one would be a great performance boost. And once we do have state saving, or the ability to quickly load up significant message history, then this workaround would no longer be needed. |
And I guess this could even be a separate endpoint that accepts the same data as the actual chat completions endpoint, but is implemented differently with a single persistent model and context that just adds the most recent message from the user to the message history and returns the generation. If that makes sense. I didn't know if your initial message for this issue was the format you wanted to use or not. |
No that was just trying to describe how to check which messages to process. Basically if one list is a prefix of the other than just process the difference of the two lists, otherwise we need to start processing from scratch. As for the API I'll keep the endpoint the same, unfortunately this workaround isn't gauranteed to work the same as re-processing from scratch but it's probably worth it to get this functionality in sooner rather than later. |
Does api accept chat history? |
@djaffer Yes if you check the docs it's identical to the OpenAI api where you send in the entire chat history https://platform.openai.com/docs/api-reference/chat |
Okay, I've got a basic implementation, just need to clean up some issues but generally it's working as expected (chats are significantly more responsive). The way I'm implementing this is through a @MillionthOdin16 in the #17 issue could you give me a hand identifying the prompt formats for various models? For text completion this is user provided but for the chat it's a little bit more challenging (as illustrated above) because the finetuned models each expect a slightly different format. |
@MillionthOdin16 this is now pushed to main (not PyPI yet) if you get the chance can you test it out from source? All you need to do for the server is run with with |
How do I implement this? I'm completely new to this thing, and have zero coding knowledge beyond hello world. How can I put this to use in in termux on Android 13? |
As suggested by @MillionthOdin16 because implementing #44 is taking longer than expected we should add a simple form of chat continuation if the previous message history matches. ie
Request 1
Response 1
Request 2
In this case we only need to process msg4 and return a new msg5.
The text was updated successfully, but these errors were encountered: