Skip to content
View rese1f's full-sized avatar

Highlights

  • Pro

Organizations

@CVNext

Block or report rese1f

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Python 1,462 127 Updated Oct 29, 2024

[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Python 1,389 87 Updated Sep 7, 2023

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Python 524 41 Updated Oct 30, 2024

🔥 Aurora Series: A more efficient multimodal large language model series for video.

Python 41 4 Updated Oct 28, 2024
Python 259 15 Updated Nov 5, 2024
Python 145 8 Updated Nov 4, 2024
Jupyter Notebook 466 22 Updated Aug 23, 2024

A Video Tokenizer Evaluation Dataset

Python 38 1 Updated Nov 6, 2024

A suite of image and video neural tokenizers

Python 627 14 Updated Nov 8, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,991 175 Updated Oct 4, 2024

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Python 78 2 Updated Nov 7, 2024

This is the project website for the paper "Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making".

HTML 2 1 Updated Nov 8, 2024

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 2,234 150 Updated Nov 7, 2024

Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.

TypeScript 514 11 Updated Nov 8, 2024

An official implementation of Pangu-Weather

Python 1,088 201 Updated Jan 12, 2024

[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"

Python 287 8 Updated May 28, 2024

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 1,282 163 Updated Oct 23, 2024
2 Updated Oct 31, 2024

Let's finetune video generation models!

Python 166 3 Updated Nov 10, 2024
Python 917 92 Updated Nov 6, 2024
Python 12 Updated Nov 8, 2024

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜

Jupyter Notebook 865 82 Updated Sep 11, 2024

A collection tutorial of codebases and papers on building world best text-to-image synthesis generative model.

2 Updated Oct 25, 2024

[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling

Python 36 Updated Nov 8, 2024
Python 102 3 Updated Aug 23, 2023

Estimating Body and Hand Motion in an Ego-sensed World

Python 150 11 Updated Nov 5, 2024

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 341 13 Updated Oct 29, 2024

OpenEQA Embodied Question Answering in the Era of Foundation Models

Jupyter Notebook 233 21 Updated Sep 20, 2024
Next