Description
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
When taking the popular video generated by sora with prompt "A stylish woman walks down a Tokyo street..." as input and ask MiniCPM-V2.6/O-2.6 to "Characterize the video using a well-detailed description, capturing its essence and events."
The online demos return
Error, please retry
When using offline codes with Transformers, the model returns
The video presents a static, monochromatic sequence of frames, each depicting an unchanging landscape that appears to be a grayscale representation of a barren field or desert with no distinguishable features. There is a consistent absence of movement, characters, or any dynamic elements throughout the entire duration of the video. The lack of variation in color and content across all frames suggests a deliberate choice to maintain a singular visual theme, potentially emphasizing themes of desolation, stillness, or meditation on a specific concept without progression or narrative development.
I tried other videos and they all worked fine. I'm curious about what went wrong and how I should generate the correct results.
期望行为 | Expected Behavior
Generate correct caption of the video
复现方法 | Steps To Reproduce
- Upload video in the url: https://cdn.openai.com/sora/videos/tokyo-walk.mp4
- Input instruction: Characterize the video using a well-detailed description, capturing its essence and events.
- Click submit
运行环境 | Environment
Online demo/Offline codes in the official GitHub Readme
备注 | Anything else?
No response