-
University of Washington
- Seattle, US
-
21:48
(UTC -12:00) - https://rese1f.github.io/
- @wenhaocha1
- rese1f
- in/wenhao-chai-658274238
Highlights
- Pro
Stars
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
🔥 Aurora Series: A more efficient multimodal large language model series for video.
A suite of image and video neural tokenizers
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
This is the project website for the paper "Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making".
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
An official implementation of Pangu-Weather
[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
A collection tutorial of codebases and papers on building world best text-to-image synthesis generative model.
[Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
Estimating Body and Hand Motion in an Ego-sensed World
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
OpenEQA Embodied Question Answering in the Era of Foundation Models