The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
forked from anyscale/llm-continuous-batching-benchmarks
-
Notifications
You must be signed in to change notification settings - Fork 0
yangulei/llm-continuous-batching-benchmarks
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 72.3%
- Shell 27.7%