Dockerfile.neuron.dev has been deprecated. Please refer to deep learning containers repository for neuron torchserve containers.
Dockerfile.dev has been deprecated. Please refer to Dockerfile for dev torchserve containers.
- Prerequisites
- Create TorchServe docker image
- Create torch-model-archiver from container
- Running TorchServe docker image in production
-
docker - Refer to the official docker installation guide
-
git - Refer to the official git set-up guide
-
For base Ubuntu with GPU, install following nvidia container toolkit and driver-
-
NOTE - Dockerfiles have not been tested on windows native platform.
If you have not cloned TorchServe source then:
git clone https://github.com/pytorch/serve.git
cd serve/docker
Use build_image.sh
script to build the docker images. The script builds the production
, dev
and ci
docker images.
Parameter | Description |
---|---|
-h, --help | Show script help |
-b, --branch_name | Specify a branch name to use. Default: master |
-g, --gpu | Build image with GPU based ubuntu base image |
-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04 | |
-bt, --buildtype | Which type of docker image to build. Can be one of : production, dev, ci |
-t, --tag | Tag name for image. If not specified, script uses torchserve default tag names. |
-cv, --cudaversion | Specify to cuda version to use. Supported values cu92 , cu101 , cu102 , cu111 , cu113 , cu116 , cu117 , cu118 . cu121 , Default cu121 |
-ipex, --build-with-ipex | Specify to build with intel_extension_for_pytorch. If not specified, script builds without intel_extension_for_pytorch. |
-n, --nightly | Specify to build with TorchServe nightly. |
-py, --pythonversion | Specify the python version to use. Supported values 3.8 , 3.9 , 3.10 , 3.11 . Default 3.9 |
PRODUCTION ENVIRONMENT IMAGES
Creates a docker image with publicly available torchserve
and torch-model-archiver
binaries installed.
- To create a CPU based image
./build_image.sh
-
To create a GPU based image with cuda 10.2. Options are
cu92
,cu101
,cu102
,cu111
,cu113
,cu116
,cu117
,cu118
- GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.
./build_image.sh -g -cv cu117
- To create an image with a custom tag
./build_image.sh -t torchserve:1.0
NVIDIA CUDA RUNTIME BASE IMAGE
To make use of ONNX, we need to use NVIDIA CUDA runtime as the base image. This will increase the size of your Docker Image
./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117
DEVELOPER ENVIRONMENT IMAGES
Creates a docker image with torchserve
and torch-model-archiver
installed from source.
- For creating CPU based image :
./build_image.sh -bt dev
- For creating CPU based image with a different branch:
./build_image.sh -bt dev -b my_branch
- For creating GPU based image with cuda version 11.3:
./build_image.sh -bt dev -g -cv cu113
- For creating GPU based image with cuda version 11.1:
./build_image.sh -bt dev -g -cv cu111
- For creating GPU based image with cuda version 10.2:
./build_image.sh -bt dev -g -cv cu102
- For creating GPU based image with cuda version 10.1:
./build_image.sh -bt dev -g -cv cu101
- For creating GPU based image with cuda version 9.2:
./build_image.sh -bt dev -g -cv cu92
- For creating GPU based image with a different branch:
./build_image.sh -bt dev -g -cv cu113 -b my_branch
./build_image.sh -bt dev -g -cv cu111 -b my_branch
- For creating image with a custom tag:
./build_image.sh -bt dev -t torchserve-dev:1.0
- For creating image with Intel® Extension for PyTorch*:
./build_image.sh -bt dev -ipex -t torchserve-ipex:1.0
The following examples will start the container with 8080/81/82 and 7070/71 port exposed to localhost
.
TorchServe's Dockerfile configures ports 8080
, 8081
, 8082
, 7070
and 7071
to be exposed to the host by default.
When mapping these ports to the host, make sure to specify localhost
or a specific ip address.
For the latest version, you can use the latest
tag:
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest
For specific versions you can pass in the specific tag to use (ex: pytorch/torchserve:0.1.1-cpu):
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cpu
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 torchserve-ipex:1.0
For GPU latest image with gpu devices 1 and 2:
docker run --rm -it --gpus '"device=1,2"' -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu
For specific versions you can pass in the specific tag to use (ex: 0.1.1-cuda10.1-cudnn7-runtime
):
docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:0.1.1-cuda10.1-cudnn7-runtime
For the latest version, you can use the latest-gpu
tag:
docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -p 127.0.0.1:7070:7070 -p 127.0.0.1:7071:7071 pytorch/torchserve:latest-gpu
The TorchServe's inference and management APIs can be accessed on localhost over 8080 and 8081 ports respectively. Example :
curl http://localhost:8080/ping
To create mar [model archive] file for TorchServe deployment, you can use following steps
- Start container by sharing your local model-store/any directory containing custom/example mar contents as well as model-store directory (if not there, create it)
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples pytorch/torchserve:latest
1.a. If starting container with Intel® Extension for PyTorch*, add the following lines in config.properties
to enable IPEX and launcher with its default configuration.
ipex_enable=true
cpu_launcher_enable=true
docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 --name mar -v $(pwd)/config.properties:/home/model-server/config.properties -v $(pwd)/model-store:/home/model-server/model-store -v $(pwd)/examples:/home/model-server/examples torchserve-ipex:1.0
- List your container or skip this if you know container name
docker ps
- Bind and get the bash prompt of running container
docker exec -it <container_name> /bin/bash
You will be landing at /home/model-server/.
- Download the model weights if you have not done so already (they are not part of the repo)
curl -o /home/model-server/examples/image_classifier/densenet161-8d451a50.pth https://download.pytorch.org/models/densenet161-8d451a50.pth
- Now Execute torch-model-archiver command e.g.
torch-model-archiver --model-name densenet161 --version 1.0 --model-file /home/model-server/examples/image_classifier/densenet_161/model.py --serialized-file /home/model-server/examples/image_classifier/densenet161-8d451a50.pth --export-path /home/model-server/model-store --extra-files /home/model-server/examples/image_classifier/index_to_name.json --handler image_classifier
Refer torch-model-archiver for details.
- densenet161.mar file should be present at /home/model-server/model-store
You may want to consider the following aspects / docker options when deploying torchserve in Production with Docker.
-
Shared Memory Size
shm-size
- The shm-size parameter allows you to specify the shared memory that a container can use. It enables memory-intensive containers to run faster by giving more access to allocated memory.
-
User Limits for System Resources
--ulimit memlock=-1
: Maximum locked-in-memory address space.--ulimit stack
: Linux stack size
The current ulimit values can be viewed by executing
ulimit -a
. A more exhaustive set of options for resource constraining can be found in the Docker Documentation here, here and here -
Exposing specific ports / volumes between the host & docker env.
-p8080:8080 -p8081:8081 -p 8082:8082 -p 7070:7070 -p 7071:7071
TorchServe uses default ports 8080 / 8081 / 8082 for REST based inference, management & metrics APIs and 7070 / 7071 for gRPC APIs. You may want to expose these ports to the host for HTTP & gRPC Requests between Docker & Host.- The model store is passed to torchserve with the --model-store option. You may want to consider using a shared volume if you prefer pre populating models in model-store directory.
For example,
docker run --rm --shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p 127.0.0.1:8080:8080 \
-p 127.0.0.1:8081:8081 \
-p 127.0.0.1:8082:8082 \
-p 127.0.0.1:7070:7070 \
-p 127.0.0.1:7071:7071 \
--mount type=bind,source=/path/to/model/store,target=/tmp/models <container> torchserve --model-store=/tmp/models
This is an example showing serving MNIST model using Docker.