Skip to content
View exiawsh's full-sized avatar

Block or report exiawsh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Next-Token Prediction is All You Need

Python 845 25 Updated Sep 30, 2024

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 448 14 Updated Aug 9, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 4,930 403 Updated Oct 2, 2024

MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 243 9 Updated Sep 30, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,474 140 Updated Oct 4, 2024

Official repository for paper "Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving"

18 Updated Sep 5, 2024
Python 5 Updated Jul 31, 2024

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Python 509 43 Updated Sep 19, 2024

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 274 24 Updated Jun 6, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,351 71 Updated Oct 6, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 811 41 Updated Oct 6, 2024

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"

Python 113 7 Updated Aug 11, 2024

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 2,888 208 Updated Sep 25, 2024

Official repository for the paper PLLaVA

Python 573 38 Updated Jul 28, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 192 26 Updated Aug 15, 2024

[NeurIPS 2024 D&B Track] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python 1,237 44 Updated Aug 7, 2024

An open source implementation of CLIP.

Python 9,938 959 Updated Aug 19, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Python 59 3 Updated Jun 16, 2024

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Python 59 5 Updated Jan 30, 2024

[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Python 37 3 Updated Jul 26, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 652 36 Updated Aug 5, 2024

HPHS: Hierarchical Planning based on Hybrid Frontier Sampling for Unknown Environments Exploration

Python 16 Updated Jul 19, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,785 108 Updated Jul 29, 2024

A collection of papers on Diffusion for Image-to-Image Translation and Style Transfer

Python 106 14 Updated Oct 7, 2024

A collection of awesome resources image-to-image translation.

1,167 119 Updated Sep 3, 2024

LLM101n: Let's build a Storyteller

29,193 1,599 Updated Aug 1, 2024

A Framework of Small-scale Large Multimodal Models

Python 597 53 Updated Sep 10, 2024

Codebase for the WayveScenes101 Dataset

Python 157 5 Updated Sep 25, 2024

A collection of visual instruction tuning datasets.

Python 75 3 Updated Mar 14, 2024

Official Code Release of Delphi

45 Updated Jun 4, 2024
Next