Skip to content

Update context window management to avoid context shifts #3176

Open
@jmorganca

Description

What are you trying to do?

Today, upon reaching the context window limit, a "context shift" occurs, effectively halving the number of tokens in the context window to make room for new generations. However, we should avoid this – OpenAI and other tools instead have token limits that, when reached, stop generation and let the user know.

How should we solve this?

A few ideas:

  • Make sure at least x% of the prompt is available for generation beyond the prompt
  • Add a reason or similar key to /api/generate and /api/chat so it's obvious when the token limit is hit

What is the impact of not solving this?

Possible run-ons and poorer responses from context shifting

Anything else?

No response

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions