Awesome-Multimodal-Spatio-Temporal-LLMs

🌱 How to participate in this awesome

You are welcome to add new multimodal works, fix errors, or make any other modifications that help make this awesome more useful or interesting. Click here to find the contribution tutorial. We promise that your pull requests will be processed within 24 hours. Thank you for your contributions.

⭐ Table of Content

1. Surveys

Paper	Venue	Time	Link	Notes
Knowledge Mechanisms in Large Language Models: A Survey and Perspective	Arxiv	2024.07	Arxiv	categorise the knowledge mechanism of LLMs as knowledge utilization and evolution. Knowledge utilization delves into the mechanism of memorization, comprehension and application, and creation. Knowledge evolution focuses on the dynamic progression of knowledge within individual and group LLMs.
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	Arxiv	2024.05	Arxiv	introduce background knowledge of the 3D field and the revolutionary changes that LLM has brought to the field
Large Multimodal Agents: A Survey	Arxiv	2024.02	Arxiv	🕐 Coming soon...
The Revolution of Multimodal Large Language Models: A Survey	ACL 2024	V1:2024.02 - V2:2024.06	Arxiv	🕐 Coming soon...
MM-LLMs: Recent Advances in MultiModal Large Language Models	ACL 2024	V1:2024.01 - V5:2024.05	Arxiv	🕐 Coming soon...
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning	Arxiv	2024.01	Arxiv	🕐 Coming soon...
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey	Arxiv	2023.12	Arxiv	🕐 Coming soon...
Multimodal Large Language Models: A Survey	BigData 2023	2023.11	Arxiv	🕐 Coming soon...
A Survey on Multimodal Large Language Models for Autonomous Driving	WACV 2024	2023.11	Arxiv	🕐 Coming soon...
Multimodal Foundation Models: From Specialists to General-Purpose Assistants	Arxiv	2023.09	Arxiv	🕐 Coming soon...
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models	Arxiv	2023.08	Arxiv	🕐 Coming soon...
A Survey on Multimodal Large Language Models	Arxiv	2023.06	Arxiv	discuss MLLM from four perspectives: Multimodal Instruction Tuning、Multimodal In-Context Learning、Multimodal Chain of Thought、LLM-Aided Visual Reasoning

2. Analysis

Title	Time	Link	Notes
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	2024.01	Paper	The authors defined a "CLIP-blind pair" as two images that appear visually dissimilar but have very similar features according to CLIP's output. They also utilized GPT to summarize the characteristics of images that the model finds challenging to recognize.

3. Models

Name	Time	Modal	Params	Link	Notes
Qwen2-VL	2024.08	Image Video Language	2B, 7B, 72B	Github
PaliGemma	2024.07	Image Language	3B	Paper	VL Large Model Focused on transfer learning
MM1	2024.03	Image Language	3B, 7B, 30B	Paper, Github	Ablation experiments are performed on the model architecture decisions and pre-training data choices to determine the optimal configuration
MiniCPM-V	2024.02	Image Language	2B, 8B	Paper, Github	Lightweight VL models focusing on end-side deployment
InternVL	2023.12	Image Language	14B, 40B	Paper
LLaVA	2023.04	Image Language	V1: 7B, 13B V1.5: 7B, 13B V1.6: 7B, 13B, 34B	Page, Paper1, Paper2, Github	is the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data and trains an end-to-end large multimodal model that connects a vision encoder and an LLM for generalpurpose visual and language understanding.

4. Datasets

Datasets	Time	Modal	Scale	Annotation	Data sources	Link	Notes
FILIP300M	2021.11	Vision Language	300M image-text pairs	image-text pairs	Internet	Paper	Removing the images whose shorter dimension is smaller than 200 pixels and the aspect ratio is larger than 3. Keeping only English texts, and excluding meaningless ones. Discarding image-text pairs whose texts are repeated over 10 times.

5. Benchmarks

Name	Time	Task	Link
MM-Vet v2	2024.08	Recognition, Knowledge, OCR, Spatial awareness, Language generation, Math, image-text sequence understanding	Paper, Github
MMVP	2024.01	VQA for "CLIP-blind Pairs"	Page, Paper, Github
MM-Vet	2023.08	Recognition, Knowledge, OCR, Spatial awareness, Language generation, Math	Paper, Github
MME	2023.06	14 subtasks: Existence, Count, OCR, Poster, Celebrity, Commonsense Reasoning, Text Translation...	Paper, Github
Perception Test	2023.05	object tracking, point tracking, temporal action localisation, temporal sound localisation, multiple-choice video question-answering, grounded video question-answering	Paper, Github

6. Technologies

Name	Link	Notes
LoRA	Paper	By transforming the full-parameter optimization into the optimization of two low-rank matrices through low-rank decomposition, the memory usage during training is reduced.
Data Filtering Networks(DFN)	Paper	A CLIP model with high accuracy on downstream tasks is not necessarily a good data filtering model; a small amount of high-quality pre-training data is more important.

7. Other awesome

Name	Link	Scope
Awesome-LLM-Tabular	Github	Tabular, LLM
Awesome-LLM-3D	Github	3D, LLM
Awesome-Multimodal-Large-Language-Models	Github	Multimodal, Dataset, LLM
Awesome-Multimodal-Papers	Github	LargeModel, Benchmark, Task, Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
CONTRIBUTION.md		CONTRIBUTION.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Multimodal-Spatio-Temporal-LLMs

🌱 How to participate in this awesome

⭐ Table of Content

1. Surveys

2. Analysis

3. Models

4. Datasets

5. Benchmarks

6. Technologies

7. Other awesome

About

Releases

Packages

Hjopsen/Awesome-Multimodal-Spatio-Temporal-LLMs

Folders and files

Latest commit

History

Repository files navigation

Awesome-Multimodal-Spatio-Temporal-LLMs

🌱 How to participate in this awesome

⭐ Table of Content

1. Surveys

2. Analysis

3. Models

4. Datasets

5. Benchmarks

6. Technologies

7. Other awesome

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages