Open
Description
Overview
This is a list of changes to the public HTTP interface of the llama-server
example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master
branch.
If you are building a 3rd party project that relies on llama-server
, it is recommended to follow this issue and check it carefully before upgrading to new versions.
See also:
Recent API changes (most recent at the top)
version | PR | desc |
---|---|---|
b4599 | #9639 | /v1/chat/completions now supports tools & tool_choice |
TBD. | #10974 | /v1/completions is now OAI-compat |
TBD. | #10783 | logprobs is now OAI-compat, default to pre-sampling probs |
TBD. | #10861 | /embeddings supports pooling type none |
TBD. | #10853 | Add optional "tokens" output to /completions endpoint |
b4337 | #10803 | Remove penalize_nl |
b4265 | #10626 | CPU docker images working directory changed to /app |
b4285 | #10691 | (Again) Change /slots and /props responses |
b4283 | #10704 | Change /slots and /props responses |
b4027 | #10162 | /slots endpoint: remove slot[i].state , add slot[i].is_processing |
b3912 | #9865 | Add option to time limit the generation phase |
b3911 | #9860 | Remove self-extend support |
b3910 | #9857 | Remove legacy system prompt support |
b3897 | #9776 | Change default security settings, /slots is now disabled by defaultEndpoints now check for API key if it's set |
b3887 | #9510 | Add /rerank endpoint |
b3754 | #9459 | Add [DONE]\n\n in OAI stream response to match spec |
b3721 | #9398 | Add seed_cur to completion response |
b3683 | #9308 | Environment variable updated |
b3599 | #9056 | Change /health and /slots |
For older changes, use:
git log --oneline -p b3599 -- examples/server/README.md
Upcoming API changes
- TBD