[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
-
Updated
Oct 2, 2024 - Python
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"
[Under review] Assessing and Learning Alignment of Unimodal Vision and Language Models
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
This is a curated list of "Continual Learning with Pretrained Models" research.
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
Repo for the paper "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"
PicQ: Demo for MiniCPM-V 2.6 to answer questions about images using natural language.
TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.
VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.
Code for Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [IJCV 2024], Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation [ICCV 2023]
Add a description, image, and links to the vision-language-models topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-models topic, visit your repo's landing page and select "manage topics."