Skip to content

DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence (WACV 2025)

Notifications You must be signed in to change notification settings

CompVis/distilldift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚗️ DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence

Frank Fundel · Johannes Schusterbauer · Vincent Tao Hu · Björn Ommer

CompVis @ LMU Munich, MCML

WACV 2025

Website Paper

TLDR;

We present DistillDIFT, a highly efficient approach to semantic correspondence that delivers state-of-the-art performance with significantly reduced computational cost. Unlike traditional methods that combine multiple large generative models, DistillDIFT uses a novel distillation technique to unify the strengths of two vision foundation models into a single, streamlined model. By integrating 3D data without requiring human annotations, DistillDIFT further improves accuracy.

Overall, our empirical results demonstrate that our distilled model with 3D data augmentation achieves superior performance to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.

🛠️ Setup

This setup was tested with Ubuntu 22.04.4 LTS, CUDA Version: 12.2, and Python 3.9.20.

First, clone the github repo...

git clone git@github.com:CompVis/distilldift.git
cd DistillDIFT

🔬 Evaluation on SPair-71K

Our evaluation pipeline for SPair-71K is based on Telling-Left-From-Right for better comparability.

Follow their environment setup and data preparation, don't forget to first:

cd eval

And then run the evaluation script via

bash eval_distilldift.sh

🏋️ Training

First use

cd train

Then you have either the option to setup a virtual environment and install all required packages with pip via

pip install -r requirements.txt

or if you prefer to use conda create the conda environment via

conda env create -f environment.yaml

Download the COCO dataset and embed the images (for unsupervised training) via

bash datasets/download_coco.sh
python embed.py --dataset_name COCO

And run the training via

  • Unsupervised Distillation
    accelerate launch --multi_gpu --num_processes 4 train.py distilled_us --dataset_name COCO --use_cache
  • Weakly Supervised Distillation
    accelerate launch --multi_gpu --num_processes 4 train.py distilled_ws --dataset_name SPair-71k --use_cache
  • Supervised Training
    accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name SPair-71k --use_cache

Refinement using CO3D

Follow the official instructions to download the CO3D dataset and then prepare the CO3D dataset via

python datasets/create_co3d.py

And run the training via

accelerate launch --multi_gpu --num_processes 4 train.py distilled_s --dataset_name CO3D --use_cache

🎓 Citation

Please cite our paper:

@article{fundel2025distilldift,
  author    = {Frank Fundel and Johannes Schusterbauer and Vincent Tao Hu and Björn Ommer},
  title     = {Distillation of Diffusion Features for Semantic Correspondence},
  journal   = {WACV},
  year      = {2025},
}

About

DistillDIFT: Distillation of Diffusion Features for Semantic Correspondence (WACV 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published