Skip to content

Commit

Permalink
Init
Browse files Browse the repository at this point in the history
  • Loading branch information
ShenhaoZhu committed Mar 23, 2024
1 parent f5bc49e commit 98a762e
Show file tree
Hide file tree
Showing 42 changed files with 7,308 additions and 2 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pretrained_models/
example_data/
results/
*.zip
.vscode/
.hypothesis/
99 changes: 97 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,98 @@
# CHAMP: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

## Comming Soon
<h1 align='Center'>Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance</h1>

<div align='Center'>
<a href='https://github.com/ShenhaoZhu' target='_blank'>Shenhao Zhu</a><sup>*1</sup>&emsp;
<a href='https://github.com/Leoooo333' target='_blank'>Junming Leo Chen</a><sup>*2</sup>&emsp;
<a href='https://github.com/daizuozhuo' target='_blank'>Zuozhuo Dai</a><sup>3</sup>&emsp;
<a href='https://ai3.fudan.edu.cn/info/1088/1266.htm' target='_blank'>Yinghui Xu</a><sup>2</sup>&emsp;
<a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>1</sup>&emsp;
<a href='http://zhuhao.cc/home/' target='_blank'>Hao Zhu</a><sup>+1</sup>&emsp;
<a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>+2</sup>
</div>
<div align='Center'>
<sup>1</sup>Nanjing University <sup>2</sup>Fudan University <sup>3</sup>Alibaba Group
</div>
<div align='Center'>
<sup>*</sup>Equal Contribution
<sup>+</sup>Corresponding Author
</div>

# Framework
![framework](assets/framework.jpg)

# Installation
- System requirement: Ubuntu20.04
- Tested GPUs: A100

Create conda environment:
```bash
conda create -n champ python=3.10
conda activate champ
```
Install packages with `pip`:
```bash
pip install -r requirements.txt
```

# Download pretrained models

1. Download pretrained weight of base models:
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)
- [image_encoder](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder)

2. Download our checkpoints: \
Our [checkpoints](https://drive.google.com/drive/folders/1hZiOHG-qDf0Pj7tvfxC70JQ6wHUvUDoY?usp=sharing) consists of denoising UNet, guidance encoders, Reference UNet, and motion module.

Finally, these pretrained models should be organized as follows:

```text
./pretrained_models/
|-- champ
| |-- denoising_unet.pth
| |-- guidance_encoder_depth.pth
| |-- guidance_encoder_dwpose.pth
| |-- guidance_encoder_normal.pth
| |-- guidance_encoder_semantic_map.pth
| |-- reference_unet.pth
| `-- motion_module.pth
|-- image_encoder
| |-- config.json
| `-- pytorch_model.bin
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
|-- feature_extractor
| `-- preprocessor_config.json
|-- model_index.json
|-- unet
| |-- config.json
| `-- diffusion_pytorch_model.bin
`-- v1-inference.yaml
```

# Inference
We have provided several sets of [example data]() for inference. Please first download and place them in the `example_data` folder.
Here is the command for inference:
```bash
python inference.py --config configs/inference.yanml
```
Animation results will be saved in `results` folder. You can change the reference image or the guidance motion by modifying `inference.yaml`. We will later provide the code for obtaining driving motion from in-the-wild videos.

# Acknowledgements
We thank the authors of [MagicAnimate](https://github.com/magic-research/magic-animate), Animate Anyone(https://github.com/HumanAIGC/AnimateAnyone), and AnimateDiff(https://github.com/guoyww/AnimateDiff) for their excellent work. Our project is built upon Moore-AnimateAnyone(https://github.com/MooreThreads/Moore-AnimateAnyone), and we are grateful for their open-source contributions.

# Citation
If you find our work useful for your research, please consider citing the paper:
```
@inproceedings{zhu2024champ,
author = {Shenhao Zhu*, Junming Leo Chen*, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu},
title = {Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance},
booktile = {arxiv}
year = {2024}
}
}
```
Binary file added assets/framework.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions configs/inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
exp_name: Animation
width: 512
height: 512
data:
ref_image_path: 'example_data/ref_images/ref-01.png' # reference image path
guidance_data_folder: 'example_data/motions/motion-01' # corrsponding motion sequence folder
seed: 42

base_model_path: 'pretrained_models/stable-diffusion-v1-5'
vae_model_path: 'pretrained_models/sd-vae-ft-mse'
image_encoder_path: 'pretrained_models/image_encoder'

ckpt_dir: 'pretrained_models/champ'
motion_module_path: 'pretrained_models/champ/motion_module.pth'

num_inference_steps: 20
guidance_scale: 3.5
enable_zero_snr: true
weight_dtype: "fp16"

guidance_types:
- 'depth'
- 'normal'
- 'semantic_map'
- 'dwpose'

noise_scheduler_kwargs:
num_train_timesteps: 1000
beta_start: 0.00085
beta_end: 0.012
beta_schedule: "linear"
steps_offset: 1
clip_sample: false

unet_additional_kwargs:
use_inflated_groupnorm: true
unet_use_cross_frame_attention: false
unet_use_temporal_attention: false
use_motion_module: true
motion_module_resolutions:
- 1
- 2
- 4
- 8
motion_module_mid_block: true
motion_module_decoder_only: false
motion_module_type: Vanilla
motion_module_kwargs:
num_attention_heads: 8
num_transformer_block: 1
attention_block_types:
- Temporal_Self
- Temporal_Self
temporal_position_encoding: true
temporal_position_encoding_max_len: 32
temporal_attention_dim_div: 1

guidance_encoder_kwargs:
guidance_embedding_channels: 320
guidance_input_channels: 3
block_out_channels: [16, 32, 96, 256]

enable_xformers_memory_efficient_attention: true
Loading

0 comments on commit 98a762e

Please sign in to comment.