[Bug]: Sending /v1/completions API request to TtMllamaForConditionalGeneration model crashes model backend #53
Closed
Description
Your current environment
vLLM branch: dev (last verified commit: 2f33504)
tt-metal branch: main (last verified commit: 47fb1a2)
Model Input Dumps
No response
🐛 Describe the bug
When running TtMllamaForConditionalGeneration (https://github.com/tenstorrent/tt-metal/blob/main/models/demos/llama3/tt/generator_vllm.py#L82) meta-llama/Llama-3.2-11B-Vision-Instruct in vLLM, sending a text only request to /v1/completions
brings down the server.
INFO: 127.0.0.1:54850 - "POST /v1/completions HTTP/1.1" 200 OK
DEBUG 01-16 19:39:12 async_llm_engine.py:523] Building guided decoding logits processor. Params: GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)
WARNING 01-16 19:39:14 preprocess.py:89] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
INFO 01-16 19:39:14 engine.py:291] Added request cmpl-6ead0464a54b4f64a9d6f4e2a68aea74-0.
ERROR 01-16 19:39:14 engine.py:159] TypeError("prefill_forward() missing 1 required positional argument: 'images'")
ERROR 01-16 19:39:14 engine.py:159] Traceback (most recent call last):
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 01-16 19:39:14 engine.py:159] self.run_engine_loop()
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 01-16 19:39:14 engine.py:159] request_outputs = self.engine_step()
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 01-16 19:39:14 engine.py:159] raise e
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 01-16 19:39:14 engine.py:159] return self.engine.step()
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/engine/llm_engine.py", line 1405, in step
ERROR 01-16 19:39:14 engine.py:159] outputs = self.model_executor.execute_model(
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 01-16 19:39:14 engine.py:159] output = self.driver_worker.execute_model(execute_model_req)
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/worker/tt_worker.py", line 370, in execute_model
ERROR 01-16 19:39:14 engine.py:159] output = self.model_runner.execute_model(
ERROR 01-16 19:39:14 engine.py:159] File "/tt-metal/python_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 01-16 19:39:14 engine.py:159] return func(*args, **kwargs)
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 360, in execute_model
ERROR 01-16 19:39:14 engine.py:159] next_token_ids = self._execute_model_single_step(model_input, kv_caches, is_decode, async_out_proc_per_trace, step_idx=i)
ERROR 01-16 19:39:14 engine.py:159] File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 462, in _execute_model_single_step
ERROR 01-16 19:39:14 engine.py:159] outputs = self.model.prefill_forward(**execute_model_kwargs)
ERROR 01-16 19:39:14 engine.py:159] TypeError: prefill_forward() missing 1 required positional argument: 'images'
ERROR: Exception in ASGI application
...
Traceback (most recent call last):
File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
response_sent.set()
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
raise exc
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
await func()
File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
async for chunk in self.body_iterator:
File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
async for prompt_idx, res in result_generator:
File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
item = await d
File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
raise request_output
TypeError: prefill_forward() missing 1 required positional argument: 'images'
CRITICAL 01-16 19:39:14 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 127.0.0.1:54866 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
Metal | INFO | Disabling and clearing program cache on device 4
Metal | INFO | Disabling and clearing program cache on device 0
Metal | INFO | Closing device 4
Metal | INFO | Disabling and clearing program cache on device 4
Metal | INFO | Closing device 0
Metal | INFO | Disabling and clearing program cache on device 0
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [262]
DEBUG 01-16 19:39:15 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 01-16 19:39:15 client.py:224] Shutting down MQLLMEngineClient output handler.
Device | INFO | Closing user mode device drivers
Ideally the request should be handled, or a 400 error describing the missing "image" argument should be sent. This should not crash the model backend.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.