[BUG] <title>Failure results

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

### 该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

- [X] 我已经搜索过FAQ | I have searched FAQ

### 当前行为 | Current Behavior

When taking the popular video generated by sora with prompt "A stylish woman walks down a Tokyo street..." as input and ask MiniCPM-V2.6/O-2.6 to "Characterize the video using a well-detailed description, capturing its essence and events."

The online demos return 
> Error, please retry

When using offline codes with Transformers, the model returns
> The video presents a static, monochromatic sequence of frames, each depicting an unchanging landscape that appears to be a grayscale representation of a barren field or desert with no distinguishable features. There is a consistent absence of movement, characters, or any dynamic elements throughout the entire duration of the video. The lack of variation in color and content across all frames suggests a deliberate choice to maintain a singular visual theme, potentially emphasizing themes of desolation, stillness, or meditation on a specific concept without progression or narrative development.

I tried other videos and they all worked fine. I'm curious about what went wrong and how I should generate the correct results.

### 期望行为 | Expected Behavior

Generate correct caption of the video

### 复现方法 | Steps To Reproduce

1. Upload video in the url: https://cdn.openai.com/sora/videos/tokyo-walk.mp4
2. Input instruction: Characterize the video using a well-detailed description, capturing its essence and events.
3. Click submit

### 运行环境 | Environment

```Markdown
Online demo/Offline codes in the official GitHub Readme
```


### 备注 | Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>Failure results #724

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development