Open
Description
🚀 The feature, motivation and pitch
Currently we do not support best_of
parameter and setting it in the request will fail validation.
The sampling arg best_of
appears to be handled as an int in client side code, e.g.:
https://github.com/tenstorrent/vllm/blob/dev/benchmarks/benchmark_serving.py#L784
so it gets set to 1 when turned off in some cases like this.
Ideally the default handling of best_of = 1 should be the same as best_of = None https://github.com/tenstorrent/vllm/blob/dev/vllm/sampling_params.py#L291.
Alternatives
No response
Additional context
Currently I need to patch the benchmarking script as a work around.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.