E4T-diffusion

An implementation of Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models by using d🧨ffusers.

My summary tweet is found here.

News

2023.3.30

Release the current-best pre-trained model, trained on CelebA-HQ+FFHQ. Please see Model Zoo for more information.

Installation

$ git clone https://github.com/mkshing/e4t-diffusion.git
$ cd e4t-diffusion
$ pip install -r requirements.txt

Model Zoo

e4t-diffusion-ffhq-celebahq-v1: a pre-trained model for face trained on FFHQ+CelebA-HQ. To get better results, I used Stable unCLIP as data augmentation.
logs at the pre-training phase "a photo of *s in the beach" after domain-tuning on a Yann LeCun's photo

Pre-training

You need a domain-specific E4T pre-trained model corresponding to your target image. If your target image is your face, you need to pre-train on a large face image dataset. Or, if you have an artistic image, you might want to train on WikiArt like so.

accelerate launch pretrain_e4t.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --clip_model_name_or_path="ViT-H-14::laion2b_s32b_b79k" \
  --domain_class_token="art" \
  --placeholder_token="*s" \
  --prompt_template="art" \
  --save_sample_prompt="a photo of the *s,a photo of the *s in monet style" \
  --reg_lambda=0.01 \
  --domain_embed_scale=0.1 \
  --output_dir="pretrained-wikiart" \
  --train_image_dataset="Artificio/WikiArt" \
  --iterable_dataset \
  --resolution=512 \
  --train_batch_size=16 \
  --learning_rate=1e-6 --scale_lr \
  --checkpointing_steps=10000 \
  --log_steps=1000 \
  --max_train_steps=100000 \
  --unfreeze_clip_vision \
  --mixed_precision="fp16" \
  --enable_xformers_memory_efficient_attention

Domain-tuning

When you get a pre-trained model, you are ready for domain tuning! In this step, all parameters in addition to UNet itself (optionally text encoder) are trained. Unlike Dreambooth, E4T needs only <15 training steps according to the paper.

accelerate launch tuning_e4t.py \
  --pretrained_model_name_or_path="e4t pre-trained model path" \
  --prompt_template="a photo of {placeholder_token}" \
  --reg_lambda=0.1 \
  --output_dir="path-to-save-model" \
  --train_image_path="image path or url" \
  --resolution=512 \
  --train_batch_size=16 \
  --learning_rate=1e-6 --scale_lr \
  --max_train_steps=30 \
  --mixed_precision="fp16" \
  --enable_xformers_memory_efficient_attention

Inference

Once your domain-tuning is done, you can do inference by including your placeholder token in the prompt.

python inference.py \
  --pretrained_model_name_or_path "e4t pre-trained model path" \
  --prompt "Times square in the style of *s" \
  --num_images_per_prompt 3 \
  --scheduler_type "ddim" \
  --image_path_or_url "same image path or url as domain tuning" \
  --num_inference_steps 50 \
  --guidance_scale 7.5

Acknowledgments

I would like to thank Stability AI for providing the computer resources to test this code and train pre-trained models.

Citation

@misc{https://doi.org/10.48550/arXiv.2302.12228,
    url       = {https://arxiv.org/abs/2302.12228},
    author    = {Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or},  
    title     = {Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models},
    publisher = {arXiv},
    year      = {2023},
    copyright = {arXiv.org perpetual, non-exclusive license}
}

TODO

Pre-training
Domain-tuning
Inference
Data augmentation by stable unclip
Use an off-the-shelf face segmentation network for human face domain.

Finally, we find that for the human face domain, it is helpful to use an off-the-shelf face segmentation network [Deng et al. 2019] to mask the diffusion loss at this stage.
Support ToMe for more efficient training

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
e4t		e4t
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
pretrain_e4t.py		pretrain_e4t.py
requirements.txt		requirements.txt
tuning_e4t.py		tuning_e4t.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E4T-diffusion

News

2023.3.30

Installation

Model Zoo

Pre-training

Domain-tuning

Inference

Acknowledgments

Citation

TODO

About

Releases

Packages

Languages

License

mkshing/e4t-diffusion

Folders and files

Latest commit

History

Repository files navigation

E4T-diffusion

News

2023.3.30

Installation

Model Zoo

Pre-training

Domain-tuning

Inference

Acknowledgments

Citation

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages