serve

Inference Server Example

The inference server can be used to perform inference on any model trained on PaddlePaddle. It provides an HTTP endpoint.

Run

The inference server reads a trained model (a topology file and a parameter file) and serves HTTP request at port 8000. Because models differ in the numbers and types of inputs, the HTTP API will differ slightly for each model, please see HTTP API for the API spec, and here for the request examples of different models that illustrate the difference.

We will first show how to obtain the PaddlePaddle model, and then how to start the server.

We will use Docker to run the demo, if you are not familiar with Docker, please checkout this TLDR.

Obtain the PaddlePaddle Model

A neural network model in PaddlePaddle contains two parts: the parameter and the topology.

A PaddlePaddle training script contains the neural network topology, which is represented by layers. For example,

img = paddle.layer.data(name="img", type=paddle.data_type.dense_vector(784))
hidden = fc_layer(input=type, size=200)
prediction = fc_layer(input=hidden, size=10, act=paddle.activation.Softmax())

The parameter instance is created by the topology and updated by the train method.

...
params = paddle.parameters.create(cost)
...
trainer = paddle.trainer.SGD(cost=cost, parameters=params)
...

PaddlePaddle stores the topology and parameter separately.

To serialize a topology, we need to create a topology instance explicitly by the outputs of the neural network. Then, invoke serialize_for_inference method.

# Save the inference topology to protobuf.
inference_topology = paddle.topology.Topology(layers=prediction)
with open("inference_topology.pkl", 'wb') as f:
    inference_topology.serialize_for_inference(f)

To save a parameter, we need to invoke save_parameter_to_tar method of trainer.

with open('param.tar', 'w') as f:
    trainer.save_parameter_to_tar(f)

After serializing the parameter and topology into two files, we could use them to set up an inference server.

For a working example, please see train.py.

Start the Server

Make sure the inference_topology.pkl and param.tar mentioned in the last section are in your current working directory, and run the command:

docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve

The above command will mount the current working directory to the /data/ directory inside the docker container. The inference server will load the model topology and parameters that we just created from there.

To run the inference server with GPU support, please make sure you have nvidia-docker first, and run:

nvidia-docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu

this command will start a server on port 8000.

After you are done with the demo, you can run docker stop paddle_serve to stop this docker container.

HTTP API

The inference server will handle HTTP POST request on path /. The content type of the request and response is json. You need to manually add Content-Type request header as Content-Type: application/json.

The request json object is a single json dictionay object, whose key is the layer name of input data. The type of the corresponding value is decided by the data type. For most cases the corresponding value will be a list of floats. For completeness, we will list all data types below:

There are twelve data types supported by PaddePaddle:

	plain	a sequence	a sequence of sequence
dense	[ f, f, f, f, ... ]	[ [f, f, f, ...], [f, f, f, ...]]	[[[f, f, ...], [f, f, ...]], [[f, f, ...], [f, f, ...]], ...]
integer	i	[i, i, ...]	[[i, i, ...], [i, i, ...], ...]
sparse	[i, i, ...]	[[i, i, ...], [i, i, ...], ...]	[[[i, i, ...], [i, i, ...], ...], [[i, i, ...], [i, i, ...], ...], ...]
sparse	[[i, f], [i, f], ... ]	[[[i, f], [i, f], ... ], ...]	[[[[i, f], [i, f], ... ], ...], ...]

In the table, i stands for a int value and f stands for a float value.

What data_type should be used is decided by the training topology. For example,

For image data, they are usually a plain dense vector, we flatten the image into a vector. The pixel values of that image are usually normalized in [-1.0, 1.0] or [0.0, 1.0](depends on each neural network).
```
+-------+
|243 241|
|139 211| +---->[0.95, 0.95, 0.54, 0.82]
+-------+
```
For text data, each word of that text is represented by an integer. The association map between word and integer is decided by the training process. A sentence is represented by a list of integer.
```
 I am good .
     +
     |
     v
23 942 402 19  +----->  [23, 942, 402, 19]
```

A sample request data of a 4x4 image and a sentence could be

{
    "img": [
        0.95,
        0.95,
        0.54,
        0.82
    ],
    "sentence": [
        23,
        942,
        402,
        19
    ]
}

The response is a json object, too. The example of return data are:

{
  "code": 0,
  "data": [
    [
      0.10060056298971176,
      0.057179879397153854,
      0.1453431099653244,
      0.15825574100017548,
      0.04464773088693619,
      0.1566203236579895,
      0.05657859891653061,
      0.12077419459819794,
      0.08073269575834274,
      0.07926714420318604
    ]
  ],
  "message": "success"
}

Here, code and message represent the status of the request. data corresponds to the outputs of the neural network; they could be a probability of each class, could be the IDs of output sentence, and so on.

MNIST Demo Client

If you have trained an model with train.py and start a inference server. Then you can use this client to test if it works right.

Build

We have already prepared the pre-built docker image paddlepaddle/book:serve, here is the command if you want to build the docker image again.

docker build -t paddlepaddle/book:serve .
docker build -t paddlepaddle/book:serve-gpu -f Dockerfile.gpu .

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serve

serve

README.md

Inference Server Example

Run

Obtain the PaddlePaddle Model

Start the Server

HTTP API

MNIST Demo Client

Build

Files

serve

Directory actions

More options

Directory actions

More options

Latest commit

History

serve

Folders and files

parent directory

README.md

Inference Server Example

Run

Obtain the PaddlePaddle Model

Start the Server

HTTP API

MNIST Demo Client

Build