ModalBed is a PyTorch-based framework designed to facilitate reproducible and solid research in modality generalization, as introduced in Towards Modality Generalization: A Benchmark and Prospective Analysis.
The problem of Modality Generalization (MG). The learner trained on multiple modalities (e.g., video or audio) is capable of performing well on unseen modalities (e.g., depth) during testing.
ModalBed is an ongoing project that will be continually updated with new results, algorithms, and datasets. Contributions from fellow researchers through pull requests are highly encouraged and welcomed :).
See more details in CONTRIBUTING.md to contribut more algorithms, datasets, perceptors.
-
Feature Concatenation (Concat)
-
On-the-fly Gradient Modulation (OGM)
-
Dynamically Learning Modality Gap (DLMG)
-
Empirical Risk Minimization (ERM)
-
Inter-domain Mixup (Mixup)
-
Class-conditional DANN (CDANN)
-
Style Agnostic Networks (SagNet)
-
Information Bottleneck (IB_ERM)
-
Conditional Contrastive Adversarial Domain (CondCAD)
-
Empirical Quantile Risk Minimization (EQRM)
-
Adaptive Risk Minimization (ARM)
-
ERM++: An Improved Baseline for Domain Generalization (ERM++)
-
Invariant Risk Minimization (IRM)
-
Additive Disentanglement of Domain Features with Remix Loss (ADRMX)
-
Uniform Risk Minimization (URM)
-
MSR-VTT: MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
-
NYUDv2: Indoor Segmentation and Support Inference from RGBD Images
-
VGGSound: VGGSound: A Large-scale Audio-Visual Dataset
-
TVL: A Touch, Vision, and Language Dataset for Multimodal Alignment
-
MOSEI: CMU Multimodal Opinion Sentiment and Emotion Intensity
-
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
-
AudioSet: A sound vocabulary and dataset
Download the dartasets:
python3 -m modalbed.scripts.download --dataset="msr_vtt"
Download the percetors:
python3 -m modalbed.scripts.download --perceptor="imagebind"
Train a model:
python3 -m modalbed.scripts.train --data_dir=./datasets/ --algorithm ERM --dataset NYUDv2 --test_env 0 --perceptor imagebind
Launch a sweep:
CUDA_VISIBLE_DEVICES=2,3 python -m modalbind.scripts.sweep launch --data_dir=./dataset/ --output_dir=./msrvtt_imagebind --command_launcher multi_gpu --datasets MSR_VTT --perceptor imagebind --n_hparams 3 --n_trials 3 --algorithms ERM IRM Mixup CDANN SagNet # ...
Collect the results (automatically generate the latex table in modalbed/results
):
python -m modalbed.scripts.collect_results --mode=weak
- DomainBed, a suite to test domain generalization algorithms.
If you find this repository useful, please consider giving a star ⭐ and citation
@misc{liu2024modalbed,
title={Towards Modality Generalization: A Benchmark and Prospective Analysis},
author={Xiaohao Liu and Xiaobo Xia and Zhuo Huang and Tat-Seng Chua},
year={2024},
eprint={2412.18277},
archivePrefix={arXiv},
primaryClass={cs.CV},
}