Adding a GET Interface to Inference Would Allow for Better PerformanceΒ #20
Description
The current specification does not allow for good use of Cache Control i.e., client side caching, which is inefficient in production environments. The specification should add a GET request for inference to allow better use of client side caching with Cache Control. Let me explain better.
If a user is querying a deterministic model the response from the endpoint should be the same each time until the model is retrained, at which time the model should get a new version. (For non deterministic models such as simulation the current interface is fine) The current implementation only has a HTTP POST for querying the model for inference. If a HTTP GET is used with proper Cache Control settings the load on the server can be decreased. Cache control allows the client to cache response and the server to control the cache settings. By having the server control the cache other systems such as experimentation can be used on the server side without worry that the client will get the wrong response. The RFC on Cache Control is probably better at explaining this than I am and is included below.
RFC on HTTP caching: here
Currently different implementations of this specification use a more inefficient server side caching. Although server side caching can reduce the load on the server, the network bandwidth and round trip delay on the POST request are not eliminated. A good production system should utilize both client side and server side caching to have optimal results.
Here is an example of an implementation that uses server side caching: here