Skip to content

[Bug]: Sending /v1/completions API request to TtMllamaForConditionalGeneration model crashes model backend #53

Closed
@tstescoTT

Description

Your current environment

vLLM branch: dev (last verified commit: 2f33504)
tt-metal branch: main (last verified commit: 47fb1a2)

Model Input Dumps

No response

🐛 Describe the bug

When running TtMllamaForConditionalGeneration (https://github.com/tenstorrent/tt-metal/blob/main/models/demos/llama3/tt/generator_vllm.py#L82) meta-llama/Llama-3.2-11B-Vision-Instruct in vLLM, sending a text only request to /v1/completions brings down the server.

INFO:     127.0.0.1:54850 - "POST /v1/completions HTTP/1.1" 200 OK
DEBUG 01-16 19:39:12 async_llm_engine.py:523] Building guided decoding logits processor. Params: GuidedDecodingParams(json=None, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)
WARNING 01-16 19:39:14 preprocess.py:89] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
INFO 01-16 19:39:14 engine.py:291] Added request cmpl-6ead0464a54b4f64a9d6f4e2a68aea74-0.
ERROR 01-16 19:39:14 engine.py:159] TypeError("prefill_forward() missing 1 required positional argument: 'images'")
ERROR 01-16 19:39:14 engine.py:159] Traceback (most recent call last):
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 157, in start
ERROR 01-16 19:39:14 engine.py:159]     self.run_engine_loop()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 220, in run_engine_loop
ERROR 01-16 19:39:14 engine.py:159]     request_outputs = self.engine_step()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 238, in engine_step
ERROR 01-16 19:39:14 engine.py:159]     raise e
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/multiprocessing/engine.py", line 229, in engine_step
ERROR 01-16 19:39:14 engine.py:159]     return self.engine.step()
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/engine/llm_engine.py", line 1405, in step
ERROR 01-16 19:39:14 engine.py:159]     outputs = self.model_executor.execute_model(
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/executor/tt_executor.py", line 55, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_worker.py", line 370, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     output = self.model_runner.execute_model(
ERROR 01-16 19:39:14 engine.py:159]   File "/tt-metal/python_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 01-16 19:39:14 engine.py:159]     return func(*args, **kwargs)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 360, in execute_model
ERROR 01-16 19:39:14 engine.py:159]     next_token_ids = self._execute_model_single_step(model_input, kv_caches, is_decode, async_out_proc_per_trace, step_idx=i)
ERROR 01-16 19:39:14 engine.py:159]   File "/home/user/vllm/vllm/worker/tt_model_runner.py", line 462, in _execute_model_single_step
ERROR 01-16 19:39:14 engine.py:159]     outputs = self.model.prefill_forward(**execute_model_kwargs)
ERROR 01-16 19:39:14 engine.py:159] TypeError: prefill_forward() missing 1 required positional argument: 'images'
ERROR:    Exception in ASGI application
...
Traceback (most recent call last):
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 409, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/tt-metal/python_env/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/middleware/base.py", line 189, in __call__
    response_sent.set()
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/_utils.py", line 83, in collapse_excgroups
    raise exc
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/tt-metal/python_env/lib/python3.8/site-packages/starlette/responses.py", line 242, in stream_response
    async for chunk in self.body_iterator:
  File "/home/user/vllm/vllm/entrypoints/openai/serving_completion.py", line 262, in completion_stream_generator
    async for prompt_idx, res in result_generator:
  File "/home/user/vllm/vllm/utils.py", line 506, in merge_async_iterators
    item = await d
  File "/home/user/vllm/vllm/engine/multiprocessing/client.py", line 598, in _process_request
    raise request_output
TypeError: prefill_forward() missing 1 required positional argument: 'images'
CRITICAL 01-16 19:39:14 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:54866 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
INFO:     Shutting down
                  Metal | INFO     | Disabling and clearing program cache on device 4
                  Metal | INFO     | Disabling and clearing program cache on device 0
                  Metal | INFO     | Closing device 4
                  Metal | INFO     | Disabling and clearing program cache on device 4
                  Metal | INFO     | Closing device 0
                  Metal | INFO     | Disabling and clearing program cache on device 0
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [262]
DEBUG 01-16 19:39:15 client.py:157] Shutting down MQLLMEngineClient check health loop.
DEBUG 01-16 19:39:15 client.py:224] Shutting down MQLLMEngineClient output handler.
                 Device | INFO     | Closing user mode device drivers

Ideally the request should be handled, or a 400 error describing the missing "image" argument should be sent. This should not crash the model backend.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Assignees

Labels

P2bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions