[Feature]: Support for logprobs sampling parameter in TT backend

### 🚀 The feature, motivation and pitch

I'm working on evaluating Llama3.1-70B on the [MMLU](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md) and [MMLU-Pro](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu_pro/README.md) datasets from [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) to compare with the benchmarks obtained by Meta <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct#instruction-tuned-models> with what Tenstorrent achieves. 

This dataset evaluation relies on the `logprobs` output of the model: <https://cookbook.openai.com/examples/using_logprobs>. However, TT backend currently does not support this parameter output: <https://github.com/tenstorrent/vllm/blob/dev/vllm/worker/tt_model_runner.py#L430> and as observed by trying to run the evaluation harness:

```bash
ERROR 11-23 14:35:16 engine.py:159] AssertionError('Currently not supporting logprobs')
ERROR 11-23 14:35:16 engine.py:159] Traceback (most recent call last):
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 11-23 14:35:16 engine.py:159]     self.run_engine_loop()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 11-23 14:35:16 engine.py:159]     request_outputs = self.engine_step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     raise e
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 11-23 14:35:16 engine.py:159]     return self.engine.step()
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/engine/llm_engine.py", line 1402, in step
ERROR 11-23 14:35:16 engine.py:159]     outputs = self.model_executor.execute_model(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_worker.py", line 333, in execute_model
ERROR 11-23 14:35:16 engine.py:159]     inputs = self.prepare_input(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-23 14:35:16 engine.py:159]     return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-23 14:35:16 engine.py:159]     self.model_runner.prepare_model_input(
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 192, in prepare_model_input
ERROR 11-23 14:35:16 engine.py:159]     self._validate_sampling_params(sampling_params)
ERROR 11-23 14:35:16 engine.py:159]   File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 430, in _validate_sampling_params
ERROR 11-23 14:35:16 engine.py:159]     assert sampling_params.logprobs is None, "Currently not supporting logprobs"
ERROR 11-23 14:35:16 engine.py:159] AssertionError: Currently not supporting logprobs
INFO:     127.0.0.1:48296 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
```

Steps to reproduce are located here: <https://github.com/tenstorrent/tt-inference-server/tree/main/evals>

Example of a run command:
```bash
lm_eval \
--model local-completions \
--model_args model=meta-llama/Meta-Llama-3.1-70B,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=32,max_retries=4,tokenized_requests=False,add_bos_token=True \
--gen_kwargs model=meta-llama/Meta-Llama-3.1-70B,stream=False \
--tasks mmlu \
--batch_size auto \
--output_path /home/mkordic/lm-evaluation-harness/eval_output  \
--seed 42  \
--log_samples
```

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support for logprobs sampling parameter in TT backend #37

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development