A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
-
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving[arxiv]
-
EMMA: End-to-End Multimodal Model for Autonomous Driving[arxiv]
-
VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions [ECCV2024]
-
HE-Drive: Human-Like End-to-End Driving with Vision Language Models [arxiv]
-
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving [ECCV2024]
-
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving [CoRL 2024]
-
LMDrive: Closed-Loop End-to-End Driving with Large Language Models [CVPR2024]
-
XGEN-MM-VID (BLIP-3-VIDEO): YOU ONLY NEED 32 TOKENS TO REPRESENT A VIDEO EVEN IN VLMS[arxiv]
-
- Code
⚠️ - HuggingFace
- Code