[Bug]: Sending /v1/completions API request to TtMllamaForConditionalGeneration model crashes model backend

### Your current environment

vLLM branch: [dev](https://github.com/tenstorrent/vllm/tree/dev) (last verified commit: [2f33504](https://github.com/tenstorrent/vllm/tree/2f33504bad49a6202d3685155107a6126a5b5e6e))
    tt-metal branch: [main](https://github.com/tenstorrent/tt-metal) (last verified commit: [47fb1a2](https://github.com/tenstorrent/tt-metal/tree/47fb1a2fb6e0b62ddfe3fc5fef95c18d4b857c20))

### Model Input Dumps

_No response_

### 🐛 Describe the bug

When running TtMllamaForConditionalGeneration (https://github.com/tenstorrent/tt-metal/blob/main/models/demos/llama3/tt/generator_vllm.py#L82) meta-llama/Llama-3.2-11B-Vision-Instruct in vLLM, sending a text only request to `/v1/completions` brings down the server.

```log
INFO:     127.0.0.1:54850 - "POST /v1/completions HTTP/1.1" 200 OK
DEBUG 01-16 19:39:12 async_llm_engine.py:523] Building guided decoding logits processor. Params: GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)
WARNING 01-16 19:39:14 preprocess.py:89] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
INFO 01-16 19:39:14 engine.py:291] Added request cmpl-6ead0464a54b4f64a9d6f4e2a68aea74-0.
ERROR 01-16 19:39:14 engine.py:159] TypeError("prefill_forward() missing 1 required positional argument: 'images'")
ERROR 01-16 19:39:14 engine.py:159] Traceback (most recent call last):
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 01-16 19:39:14 engine.py:159]     self.run_engine_loop()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 01-16 19:39:14 engine.py:159]     request_outputs = self.engine_step()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 01-16 19:39:14 engine.py:159]     raise e
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 01-16 19:39:14 engine.py:159]     return self.engine.step()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/llm_engine.py", line 1405, in step
ERROR 01-16 19:39:14 engine.py:159]     outputs = self.model_executor.execute_model(
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_worker.py", line 370, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     output = self.model_runner.execute_model(
ERROR 01-16 19:39:14 engine.py:159]   File "/tt-metal/python_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 01-16 19:39:14 engine.py:159]     return func(*args, **kwargs)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 360, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     next_token_ids = self._execute_model_single_step(model_input, kv_caches, is_decode, async_out_proc_per_trace, step_idx=i)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 462, in _execute_model_single_step
ERROR 01-16 19:39:14 engine.py:159]     outputs = self.model.prefill_forward(**execute_model_kwargs)
ERROR 01-16 19:39:14 engine.py:159] TypeError: prefill_forward() missing 1 required positional argument: 'images'
ERROR:    Exception in ASGI application
...
Traceback (most recent call last):
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
    response_sent.set()
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    async for chunk in self.body_iterator:
  File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
TypeError: prefill_forward() missing 1 required positional argument: 'images'
CRITICAL 01-16 19:39:14 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:54866 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
                  Metal | INFO     | Disabling and clearing program cache on device 4
                  Metal | INFO     | Disabling and clearing program cache on device 0
                  Metal | INFO     | Closing device 4
                  Metal | INFO     | Disabling and clearing program cache on device 4
                  Metal | INFO     | Closing device 0
                  Metal | INFO     | Disabling and clearing program cache on device 0
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [262]
DEBUG 01-16 19:39:15 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 01-16 19:39:15 client.py:224] Shutting down MQLLMEngineClient output handler.
                 Device | INFO     | Closing user mode device drivers
```

Ideally the request should be handled, or a 400 error describing the missing "image" argument should be sent. This should not crash the model backend.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Sending /v1/completions API request to TtMllamaForConditionalGeneration model crashes model backend #53

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development