Adding a GET Interface to Inference Would Allow for Better Performance

The current specification does not allow for good use of _Cache Control_ i.e., client side caching, which is inefficient in production environments. The specification should add a GET request for inference to allow better use of client side caching with _Cache Control_. Let me explain better.

 If a user is querying a deterministic model the response from the endpoint should be the same each time until the model is retrained, at which time the model should get a new version. (For non deterministic models such as simulation the current interface is fine) The current implementation only has a HTTP POST for querying the model for inference. If a HTTP GET is used with proper _Cache Control_ settings the load on the server can be decreased. Cache control allows the client to cache response and the server to control the cache settings. By having the server control the cache other systems such as experimentation can be used on the server side without worry that the client will get the wrong response. The RFC on _Cache Control_ is probably better at explaining this than I am and is included below.

RFC on HTTP caching: [here](https://httpwg.org/specs/rfc9111.html#field.cache-control)

Currently different implementations of this specification use a more inefficient server side caching. Although server side caching can reduce the load on the server, the network bandwidth and round trip delay on the POST request are not eliminated. A good production system should utilize both client side and server side caching to have optimal results.

Here is an example of an implementation that uses server side caching: [here](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/response_cache.html)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a GET Interface to Inference Would Allow for Better Performance #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development