An open source implementation of CLIP.
-
Updated
Dec 23, 2024 - Python
An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
A concise but complete implementation of CLIP with various experimental improvements from recent papers
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Build high-performance AI models with modular building blocks
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
The official repository of Achelous and Achelous++
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
A python tool to perform deep learning experiments on multimodal remote sensing data.
Add a description, image, and links to the multi-modal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal-learning topic, visit your repo's landing page and select "manage topics."