SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality

Authors: Chengyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Qifeng Chen, Zhaoxiang Zhang

SimCMF aims to transfer the ability of large RGB-based models to other modalities (e.g., Depth, Thermal, Polarization), which suffering from limited training data. For example,SimCMF enable the Segment Anything Model the ability to handle modality beyond RGB images.

Getting Started

Firstly, prepare the project and create the environment.

git clone https://github.com/mt-cly/SimCMF
cd SimCMF
conda create -n simcmf python=3.10
conda activate simcmf
pip install -r requirements.txt
# pretrained SAM-B 
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth checkpoint/sam

We provide segmentation benchmark to study the segmentation performance in various modalities.

Dataset	Supporting Modalities	Link
IVRG_RGBNIR	NIR, NIR+RGB	download(1.0G)
RGB-Thermal-Glass	Thermal, Thermal+RGB	download(3.0G)
NYUDepthv2	Depth, HHA, Depth+RGB, HHA+RGB	download(1.6G)
pgsnet	AOLP+DOLP, AOLP+DOLP+RGB	download(15.5G)
zju-rgbp	AOLP+DOLP, AOLP+DOLP+RGB	download(0.3G)

You can download one or all benchmark from given links, unzip and move them to the data folder, the file structure should be as follows.

--SimCMF
   |--data
     |--IVRG_RGBNIR
     |--NYUDepthv2
     |--pgsnet
     |--RGB-Thermal-Glass
     |--zju-rgbp

You can simply execute python train.py followed by optional arguments.

  -net         # specify the tuning methods. Options: {sam_full_finetune, sam_linear_probing, sam_mlp_adapter, sam_lora, sam_prompt, sam_prefix}
  -modality    # modality name. Options:{pgsnet_rgbp, pgsnet_p, rgbd, d, rgbhha, hha, nir, rgbnir, rgbt, t,zju-rgbp}
  -proj_type   # the pre-projection before foundation model Options: {simcmf, baseline_a, baseline_b, baseline_c, baseline_d}
  -exp_name    # the experiment name
  -val_freq    # interval epochs between each validation. Default: 5
  -b           # batch size. Default: 4
  -lr          # learning rate. It is suggested to set 3e-4 for PEFT, 3e-5 for Full Finetuning
  -weights     # the path to trained weights you want to resume

If you want to use DDP, just add extra -ddp to the command.

We provide an example command to perform adapting SAM to NIR modality in train.sh.

sh train.sh

Citation

@misc{ikemura2024robust,
      title={SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality},
      author={Lei, Chengyang and Chen, Liyi and Cen, Jun and Chen, Xiao and Lei, Zhen and Heide, Felix and Chen, Qifeng and Zhang, Zhaoxiang},
      year={2024},
      eprint={2409.08083},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
  }

Acknowledgements

The code is based on Medical-SAM-Adapter.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
checkpoint/sam		checkpoint/sam
conf		conf
docs		docs
models		models
resources		resources
README.md		README.md
cfg.py		cfg.py
dataset.py		dataset.py
function.py		function.py
requirements.txt		requirements.txt
train.py		train.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality

Getting Started

Citation

Acknowledgements

About

Releases

Packages

Languages

mt-cly/SimCMF

Folders and files

Latest commit

History

Repository files navigation

SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality

Getting Started

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages