Frank Fundel · Johannes Schusterbauer · Vincent Tao Hu · Björn Ommer
CompVis @ LMU Munich, MCML
WACV 2025
We present DistillDIFT, a highly efficient approach to semantic correspondence that delivers state-of-the-art performance with significantly reduced computational cost. Unlike traditional methods that combine multiple large generative models, DistillDIFT uses a novel distillation technique to unify the strengths of two vision foundation models into a single, streamlined model. By integrating 3D data without requiring human annotations, DistillDIFT further improves accuracy.
Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves superior performance to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.
This setup was tested with Ubuntu 22.04.4 LTS
, CUDA Version: 12.2
, and Python 3.9.20
.
First, clone the github repo...
git clone git@github.com:CompVis/distilldift.git
cd DistillDIFT
Our evaluation pipeline for SPair-71K is based on Telling-Left-From-Right for better comparability.
Follow their environment setup and data preparation, don't forget to first:
cd eval
And then run the evaluation script via
bash eval_distilldift.sh
First use
cd train
Then you have either the option to setup a virtual environment and install all required packages with pip
via
pip install -r requirements.txt
or if you prefer to use conda
create the conda environment via
conda env create -f environment.yaml
Download the COCO dataset and embed the images (for unsupervised training) via
bash datasets/download_coco.sh
python embed.py --dataset_name COCO
And run the training via
- Unsupervised Distillation
accelerate launch --multi_gpu --num_processes 4 train.py distilled_us --dataset_name COCO --use_cache
- Weakly Supervised Distillation
accelerate launch --multi_gpu --num_processes 4 train.py distilled_ws --dataset_name SPair-71k --use_cache
- Supervised Training
accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name SPair-71k --use_cache
Follow the official instructions to download the CO3D dataset and then prepare the CO3D dataset via
python datasets/create_co3d.py
And run the training via
accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name CO3D --use_cache
Please cite our paper:
@article{fundel2025distilldift,
author = {Frank Fundel and Johannes Schusterbauer and Vincent Tao Hu and Björn Ommer},
title = {Distillation of Diffusion Features for Semantic Correspondence},
journal = {WACV},
year = {2025},
}