[Feature]: Support for logprobs sampling parameter in TT backend #37
Description
🚀 The feature, motivation and pitch
I'm working on evaluating Llama3.1-70B on the MMLU and MMLU-Pro datasets from Language Model Evaluation Harness to compare with the benchmarks obtained by Meta https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct#instruction-tuned-models with what Tenstorrent achieves.
This dataset evaluation relies on the logprobs
output of the model: https://cookbook.openai.com/examples/using_logprobs. However, TT backend currently does not support this parameter output: https://github.com/tenstorrent/vllm/blob/dev/vllm/worker/tt_model_runner.py#L430 and as observed by trying to run the evaluation harness:
ERROR 11-23 14:35:16 engine.py:159] AssertionError('Currently not supporting logprobs')
ERROR 11-23 14:35:16 engine.py:159] Traceback (most recent call last):
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 11-23 14:35:16 engine.py:159] self.run_engine_loop()
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 11-23 14:35:16 engine.py:159] request_outputs = self.engine_step()
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 11-23 14:35:16 engine.py:159] raise e
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 11-23 14:35:16 engine.py:159] return self.engine.step()
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/engine/llm_engine.py", line 1402, in step
ERROR 11-23 14:35:16 engine.py:159] outputs = self.model_executor.execute_model(
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 11-23 14:35:16 engine.py:159] output = self.driver_worker.execute_model(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_worker.py", line 333, in execute_model
ERROR 11-23 14:35:16 engine.py:159] inputs = self.prepare_input(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-23 14:35:16 engine.py:159] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-23 14:35:16 engine.py:159] self.model_runner.prepare_model_input(
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 192, in prepare_model_input
ERROR 11-23 14:35:16 engine.py:159] self._validate_sampling_params(sampling_params)
ERROR 11-23 14:35:16 engine.py:159] File "/home/mkordic/vllm_test/vllm/vllm/worker/tt_model_runner.py", line 430, in _validate_sampling_params
ERROR 11-23 14:35:16 engine.py:159] assert sampling_params.logprobs is None, "Currently not supporting logprobs"
ERROR 11-23 14:35:16 engine.py:159] AssertionError: Currently not supporting logprobs
INFO: 127.0.0.1:48296 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
Steps to reproduce are located here: https://github.com/tenstorrent/tt-inference-server/tree/main/evals
Example of a run command:
lm_eval \
--model local-completions \
--model_args model=meta-llama/Meta-Llama-3.1-70B,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=32,max_retries=4,tokenized_requests=False,add_bos_token=True \
--gen_kwargs model=meta-llama/Meta-Llama-3.1-70B,stream=False \
--tasks mmlu \
--batch_size auto \
--output_path /home/mkordic/lm-evaluation-harness/eval_output \
--seed 42 \
--log_samples
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.