What's changed
Released v0.2.0
Improved the following parts based on the author @phymhan's feedback (#3)!
- Train spectral shifts for 1-D weights such as LayerNorm too. (file size: 935kB (before: 923kB))
- Using different learning rate for 1-D weights via
--learning_rate_1d
- Additionally, train spectral shifts of text encoder by
--train_text_encoder
(file size: 1.17MB)
By this change, you get better results with less training steps than the first release v0.1.1!!
sample example
accelerate launch svdiff-pytorch-2/train_svdiff.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"\
--instance_data_dir=$INSTANCE_DATA_DIR \
--class_data_dir=$CLASS_DATA_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="photo of sks woman" \
--class_prompt="photo of a woman" \
--with_prior_preservation --prior_loss_weight=1.0 \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-3 \
--learning_rate_1d=1e-6 \
--train_text_encoder \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--checkpointing_steps=200 \
--max_train_steps=1000 \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--seed=42 \
--gradient_checkpointing
"portrait of sks woman wearing kimono" where sks
indicates Gal Gadot.
Added Single Image Editing
sample script
training
accelerate launch train_svdiff.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--instance_data_dir="pink-chair-dir" \
--output_dir="output-dir" \
--instance_prompt="photo of a pink chair with black legs" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-3 \
--learning_rate_1d=1e-6 \
--train_text_encoder \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=500 \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--seed=42 \
--gradient_checkpointing
inference
import sys
import torch
from PIL import Image
from diffusers import DDIMScheduler
sys.path.append("/content/svdiff-pytorch-2")
from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff, StableDiffusionPipelineWithDDIMInversion
pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
spectral_shifts_ckpt_dir = "/content/SIE/checkpoint-500"
image = "pink-chair.jpeg"
source_prompt = "photo of a pink chair with black legs"
target_prompt = "photo of a blue chair with black legs"
unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="unet")
text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt=spectral_shifts_ckpt_dir, subfolder="text_encoder")
# load pipe
pipe = StableDiffusionPipelineWithDDIMInversion.from_pretrained(
pretrained_model_name_or_path,
unet=unet,
text_encoder=text_encoder,
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
# in this example, i didn't use ddim inversion
inv_latents = None
# (optional) ddim inversion
# image = Image.open(image).convert("RGB").resize((512, 512))
# in SVDiff, they use guidance scale=1 in ddim inversion
# inv_latents = pipe.invert(source_prompt, image=image, guidance_scale=1.0).latents
image = pipe(target_prompt, latents=inv_latents).images[0]
"photo of a pink blue chair with black legs"
* the input image was taken from https://unsplash.com/photos/1JJJIHh7-Mk