Skip to content
/ ToCa Public

Accelerating Diffusion Transformers with Token-wise Feature Caching

License

Notifications You must be signed in to change notification settings

Shenyi-Z/ToCa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[ICLR 2025] ToCa: Accelerating Diffusion Transformers with Token-wise Feature Caching

πŸ”₯ News

  • 2025/01/22 πŸ’₯πŸ’₯ ToCa is honored to be accepted by ICLR 2025!
  • 2024/12/29 πŸš€πŸš€ We release our work DuCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.50Γ— on OpenSora! πŸŽ‰ DuCa also overcomes the limitation of ToCa by fully supporting FlashAttention, enabling broader compatibility and efficiency improvements.
  • 2024/12/24 πŸ€—πŸ€— We release an open-sourse repo "Awesome-Token-Reduction-for-Model-Compression", which collects recent awesome token reduction papers! Feel free to contribute your suggestions!
  • 2024/12/20 πŸ’₯πŸ’₯ Our ToCa has achieved nearly lossless acceleration of 1.51Γ— on FLUX, feel free to check the latest version of our paper!
  • 2024/10/16 πŸ€—πŸ€— Users with autodl accounts can now quickly experience OpenSora-ToCa by directly using our publicly available image!
  • 2024/10/12 πŸš€πŸš€ We release our work ToCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.36Γ— on OpenSora!
  • 2024/07/15 πŸ€—πŸ€— We release an open-sourse repo "Awesome-Generation-Acceleration", which collects recent awesome generation accleration papers! Feel free to contribute your suggestions!

TODO:

  • Support for FLOPs calculation
  • Add the FLUX version of ToCa
  • Further optimize the code logic to reduce the time consumption of tensor operations

Dependencies

Python>=3.9
CUDA>=11.8

πŸ›  Installation

git clone https://github.com/Shenyi-Z/ToCa.git

Environment Settings

Original Models (recommended)

We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.

Links:

Original Models urls
DiT https://github.com/facebookresearch/DiT
PixArt-Ξ± https://github.com/PixArt-alpha/PixArt-alpha
OpenSora https://github.com/hpcaitech/Open-Sora

From our environment.yaml

Besides, we provide a replica for our environment here

DiT
cd DiT-ToCa
conda env create -f environment-dit.yml
PixArt-Ξ±
cd PixArt-alpha-ToCa
conda env create -f environment-pixart.yml
OpenSora
cd Open-Sora
conda env create -f environment-opensora.yml
pip install -v . # for development mode, `pip install -v -e .`

πŸš€ Run and evaluation

Run DiT-ToCa

DDPM-250 Steps

sample images for visualization

cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 250 --cache-type attention --fresh-threshold 4 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250  --force-fresh global --soft-fresh-weight 0.25

sample images for evaluation (e.g 50k)

cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --num-fid-samples 50000

DDIM-50 Steps

sample images for visualization

cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50  --force-fresh global --soft-fresh-weight 0.25 --ddim-sample

sample images for evaluation (e.g 50k)

cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 50 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --fresh-threshold 3 --soft-fresh-weight 0.25 --num-fid-samples 50000 --ddim-sample

test FLOPs

Just add --test-FLOPs, here an example:

cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50  --force-fresh global --soft-fresh-weight 0.25 --ddim-sample --test-FLOPs

Run PixArt-Ξ±-ToCa

sample images for visualization

cd PixArt-alpha-ToCa
python scripts/inference.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/test.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa

sample images for evaluation (e.g 30k for COCO, 1.6k for PartiPrompts)

cd PixArt-alpha-ToCa
torchrun --nproc_per_node=6 scripts/inference_ddp.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa

(Besides, if you need our npz file: https://drive.google.com/file/d/1vUdoSgdIvtXo1cAS_aOFCJ1-XC_i1KEQ/view?usp=sharing)

Run OpenSora-ToCa

sample video for visualizaiton

cd Open-Sora
python scripts/inference.py configs/opensora-v1-2/inference/sample.py   --num-frames 2s --resolution 480p --aspect-ratio 9:16   --prompt "a beautiful waterfall"

sample video for VBench evaluation

cd Open-Sora
bash eval/vbench/launch.sh /root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors 51 opensora-ToCa 480p 9:16

( remember replacing "/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors" with your own path!)

πŸ‘ Acknowledgements

  • Thanks to DiT for their great work and codebase upon which we build DiT-ToCa.
  • Thanks to PixArt-Ξ± for their great work and codebase upon which we build PixArt-Ξ±-ToCa.
  • Thanks to OpenSora for their great work and codebase upon which we build OpenSora-ToCa.

πŸ“Œ Citation

@article{zou2024accelerating,
  title={Accelerating Diffusion Transformers with Token-wise Feature Caching},
  author={Zou, Chang and Liu, Xuyang and Liu, Ting and Huang, Siteng and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2410.05317},
  year={2024}
}

πŸ“§ Contact

If you have any questions, please email shenyizou@outlook.com.

About

Accelerating Diffusion Transformers with Token-wise Feature Caching

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •