Update context window management to avoid context shifts #3176
Open
Description
What are you trying to do?
Today, upon reaching the context window limit, a "context shift" occurs, effectively halving the number of tokens in the context window to make room for new generations. However, we should avoid this – OpenAI and other tools instead have token limits that, when reached, stop generation and let the user know.
How should we solve this?
A few ideas:
- Make sure at least x% of the prompt is available for generation beyond the prompt
- Add a
reason
or similar key to/api/generate
and/api/chat
so it's obvious when the token limit is hit
What is the impact of not solving this?
Possible run-ons and poorer responses from context shifting
Anything else?
No response