forked from fudan-generative-vision/champ
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f5bc49e
commit 98a762e
Showing
42 changed files
with
7,308 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pretrained_models/ | ||
example_data/ | ||
results/ | ||
*.zip | ||
.vscode/ | ||
.hypothesis/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,98 @@ | ||
# CHAMP: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | ||
|
||
## Comming Soon | ||
<h1 align='Center'>Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance</h1> | ||
|
||
<div align='Center'> | ||
<a href='https://github.com/ShenhaoZhu' target='_blank'>Shenhao Zhu</a><sup>*1</sup>  | ||
<a href='https://github.com/Leoooo333' target='_blank'>Junming Leo Chen</a><sup>*2</sup>  | ||
<a href='https://github.com/daizuozhuo' target='_blank'>Zuozhuo Dai</a><sup>3</sup>  | ||
<a href='https://ai3.fudan.edu.cn/info/1088/1266.htm' target='_blank'>Yinghui Xu</a><sup>2</sup>  | ||
<a href='https://yoyo000.github.io/' target='_blank'>Yao Yao</a><sup>1</sup>  | ||
<a href='http://zhuhao.cc/home/' target='_blank'>Hao Zhu</a><sup>+1</sup>  | ||
<a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>+2</sup> | ||
</div> | ||
<div align='Center'> | ||
<sup>1</sup>Nanjing University <sup>2</sup>Fudan University <sup>3</sup>Alibaba Group | ||
</div> | ||
<div align='Center'> | ||
<sup>*</sup>Equal Contribution | ||
<sup>+</sup>Corresponding Author | ||
</div> | ||
|
||
# Framework | ||
![framework](assets/framework.jpg) | ||
|
||
# Installation | ||
- System requirement: Ubuntu20.04 | ||
- Tested GPUs: A100 | ||
|
||
Create conda environment: | ||
```bash | ||
conda create -n champ python=3.10 | ||
conda activate champ | ||
``` | ||
Install packages with `pip`: | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
# Download pretrained models | ||
|
||
1. Download pretrained weight of base models: | ||
- [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) | ||
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse) | ||
- [image_encoder](https://huggingface.co/lambdalabs/sd-image-variations-diffusers/tree/main/image_encoder) | ||
|
||
2. Download our checkpoints: \ | ||
Our [checkpoints](https://drive.google.com/drive/folders/1hZiOHG-qDf0Pj7tvfxC70JQ6wHUvUDoY?usp=sharing) consists of denoising UNet, guidance encoders, Reference UNet, and motion module. | ||
|
||
Finally, these pretrained models should be organized as follows: | ||
|
||
```text | ||
./pretrained_models/ | ||
|-- champ | ||
| |-- denoising_unet.pth | ||
| |-- guidance_encoder_depth.pth | ||
| |-- guidance_encoder_dwpose.pth | ||
| |-- guidance_encoder_normal.pth | ||
| |-- guidance_encoder_semantic_map.pth | ||
| |-- reference_unet.pth | ||
| `-- motion_module.pth | ||
|-- image_encoder | ||
| |-- config.json | ||
| `-- pytorch_model.bin | ||
|-- sd-vae-ft-mse | ||
| |-- config.json | ||
| |-- diffusion_pytorch_model.bin | ||
| `-- diffusion_pytorch_model.safetensors | ||
`-- stable-diffusion-v1-5 | ||
|-- feature_extractor | ||
| `-- preprocessor_config.json | ||
|-- model_index.json | ||
|-- unet | ||
| |-- config.json | ||
| `-- diffusion_pytorch_model.bin | ||
`-- v1-inference.yaml | ||
``` | ||
|
||
# Inference | ||
We have provided several sets of [example data]() for inference. Please first download and place them in the `example_data` folder. | ||
Here is the command for inference: | ||
```bash | ||
python inference.py --config configs/inference.yanml | ||
``` | ||
Animation results will be saved in `results` folder. You can change the reference image or the guidance motion by modifying `inference.yaml`. We will later provide the code for obtaining driving motion from in-the-wild videos. | ||
|
||
# Acknowledgements | ||
We thank the authors of [MagicAnimate](https://github.com/magic-research/magic-animate), Animate Anyone(https://github.com/HumanAIGC/AnimateAnyone), and AnimateDiff(https://github.com/guoyww/AnimateDiff) for their excellent work. Our project is built upon Moore-AnimateAnyone(https://github.com/MooreThreads/Moore-AnimateAnyone), and we are grateful for their open-source contributions. | ||
|
||
# Citation | ||
If you find our work useful for your research, please consider citing the paper: | ||
``` | ||
@inproceedings{zhu2024champ, | ||
author = {Shenhao Zhu*, Junming Leo Chen*, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu}, | ||
title = {Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance}, | ||
booktile = {arxiv} | ||
year = {2024} | ||
} | ||
} | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
exp_name: Animation | ||
width: 512 | ||
height: 512 | ||
data: | ||
ref_image_path: 'example_data/ref_images/ref-01.png' # reference image path | ||
guidance_data_folder: 'example_data/motions/motion-01' # corrsponding motion sequence folder | ||
seed: 42 | ||
|
||
base_model_path: 'pretrained_models/stable-diffusion-v1-5' | ||
vae_model_path: 'pretrained_models/sd-vae-ft-mse' | ||
image_encoder_path: 'pretrained_models/image_encoder' | ||
|
||
ckpt_dir: 'pretrained_models/champ' | ||
motion_module_path: 'pretrained_models/champ/motion_module.pth' | ||
|
||
num_inference_steps: 20 | ||
guidance_scale: 3.5 | ||
enable_zero_snr: true | ||
weight_dtype: "fp16" | ||
|
||
guidance_types: | ||
- 'depth' | ||
- 'normal' | ||
- 'semantic_map' | ||
- 'dwpose' | ||
|
||
noise_scheduler_kwargs: | ||
num_train_timesteps: 1000 | ||
beta_start: 0.00085 | ||
beta_end: 0.012 | ||
beta_schedule: "linear" | ||
steps_offset: 1 | ||
clip_sample: false | ||
|
||
unet_additional_kwargs: | ||
use_inflated_groupnorm: true | ||
unet_use_cross_frame_attention: false | ||
unet_use_temporal_attention: false | ||
use_motion_module: true | ||
motion_module_resolutions: | ||
- 1 | ||
- 2 | ||
- 4 | ||
- 8 | ||
motion_module_mid_block: true | ||
motion_module_decoder_only: false | ||
motion_module_type: Vanilla | ||
motion_module_kwargs: | ||
num_attention_heads: 8 | ||
num_transformer_block: 1 | ||
attention_block_types: | ||
- Temporal_Self | ||
- Temporal_Self | ||
temporal_position_encoding: true | ||
temporal_position_encoding_max_len: 32 | ||
temporal_attention_dim_div: 1 | ||
|
||
guidance_encoder_kwargs: | ||
guidance_embedding_channels: 320 | ||
guidance_input_channels: 3 | ||
block_out_channels: [16, 32, 96, 256] | ||
|
||
enable_xformers_memory_efficient_attention: true |
Oops, something went wrong.