Repo of alpa's multi-model serving system.
This is the official implementation of our OSDI'23 paper: AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
To reproduce all the main results in our paper, please check the artifact folder and follow the instructions in it.