Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
Updated
Jan 11, 2025 - Python
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A high-throughput and memory-efficient inference and serving engine for LLMs
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
SGLang is a fast serving framework for large language models and vision language models.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
AICI: Prompts as (Wasm) Programs
RayLLM - LLMs on Ray
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
A highly optimized LLM inference acceleration engine for Llama and its variants.
A throughput-oriented high-performance serving framework for LLMs
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
LLM (Large Language Model) FineTuning
Efficient AI Inference & Serving
🧬 Helix is a production-ready GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."