2025/01/22
π₯π₯ ToCa is honored to be accepted by ICLR 2025!2024/12/29
ππ We release our work DuCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.50Γ on OpenSora! π DuCa also overcomes the limitation of ToCa by fully supporting FlashAttention, enabling broader compatibility and efficiency improvements.2024/12/24
π€π€ We release an open-sourse repo "Awesome-Token-Reduction-for-Model-Compression", which collects recent awesome token reduction papers! Feel free to contribute your suggestions!2024/12/20
π₯π₯ Our ToCa has achieved nearly lossless acceleration of 1.51Γ on FLUX, feel free to check the latest version of our paper!2024/10/16
π€π€ Users with autodl accounts can now quickly experience OpenSora-ToCa by directly using our publicly available image!2024/10/12
ππ We release our work ToCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.36Γ on OpenSora!2024/07/15
π€π€ We release an open-sourse repo "Awesome-Generation-Acceleration", which collects recent awesome generation accleration papers! Feel free to contribute your suggestions!
- Support for FLOPs calculation
- Add the FLUX version of ToCa
- Further optimize the code logic to reduce the time consumption of tensor operations
Python>=3.9
CUDA>=11.8
git clone https://github.com/Shenyi-Z/ToCa.git
We evaluated our model under the same environments as the original models. So you may set the environments through following the requirements of the mentioned original models.
Links:
Original Models | urls |
---|---|
DiT | https://github.com/facebookresearch/DiT |
PixArt-Ξ± | https://github.com/PixArt-alpha/PixArt-alpha |
OpenSora | https://github.com/hpcaitech/Open-Sora |
Besides, we provide a replica for our environment here
cd DiT-ToCa
conda env create -f environment-dit.yml
cd PixArt-alpha-ToCa
conda env create -f environment-pixart.yml
cd Open-Sora
conda env create -f environment-opensora.yml
pip install -v . # for development mode, `pip install -v -e .`
sample images for visualization
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 250 --cache-type attention --fresh-threshold 4 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --soft-fresh-weight 0.25
sample images for evaluation (e.g 50k)
cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddpm250 --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --num-fid-samples 50000
sample images for visualization
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --soft-fresh-weight 0.25 --ddim-sample
sample images for evaluation (e.g 50k)
cd DiT-ToCa
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 50 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --fresh-threshold 3 --soft-fresh-weight 0.25 --num-fid-samples 50000 --ddim-sample
Just add --test-FLOPs, here an example:
cd DiT-ToCa
python sample.py --image-size 256 --num-sampling-steps 50 --cache-type attention --fresh-threshold 3 --fresh-ratio 0.07 --ratio-scheduler ToCa-ddim50 --force-fresh global --soft-fresh-weight 0.25 --ddim-sample --test-FLOPs
sample images for visualization
cd PixArt-alpha-ToCa
python scripts/inference.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/test.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
sample images for evaluation (e.g 30k for COCO, 1.6k for PartiPrompts)
cd PixArt-alpha-ToCa
torchrun --nproc_per_node=6 scripts/inference_ddp.py --model_path /root/autodl-tmp/pretrained_models/PixArt-XL-2-256x256.pth --image_size 256 --bs 100 --txt_file /root/autodl-tmp/COCO/COCO_caption_prompts_30k.txt --fresh_threshold 3 --fresh_ratio 0.30 --cache_type attention --force_fresh global --soft_fresh_weight 0.25 --ratio_scheduler ToCa
οΌBesides, if you need our npz file: https://drive.google.com/file/d/1vUdoSgdIvtXo1cAS_aOFCJ1-XC_i1KEQ/view?usp=sharing)
sample video for visualizaiton
cd Open-Sora
python scripts/inference.py configs/opensora-v1-2/inference/sample.py --num-frames 2s --resolution 480p --aspect-ratio 9:16 --prompt "a beautiful waterfall"
sample video for VBench evaluation
cd Open-Sora
bash eval/vbench/launch.sh /root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors 51 opensora-ToCa 480p 9:16
( remember replacing "/root/autodl-tmp/pretrained_models/hpcai-tech/OpenSora-STDiT-v3/model.safetensors" with your own path!)
- Thanks to DiT for their great work and codebase upon which we build DiT-ToCa.
- Thanks to PixArt-Ξ± for their great work and codebase upon which we build PixArt-Ξ±-ToCa.
- Thanks to OpenSora for their great work and codebase upon which we build OpenSora-ToCa.
@article{zou2024accelerating,
title={Accelerating Diffusion Transformers with Token-wise Feature Caching},
author={Zou, Chang and Liu, Xuyang and Liu, Ting and Huang, Siteng and Zhang, Linfeng},
journal={arXiv preprint arXiv:2410.05317},
year={2024}
}
If you have any questions, please email shenyizou@outlook.com
.