init

wangshaojie1995 · Feb 3, 2023 · 921631d · 921631d
commit 921631d
Show file tree

Hide file tree

Showing 158 changed files with 120,911 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,161 @@
+# big files
+data_util/face_tracking/3DMM/01_MorphableModel.mat
+data_util/face_tracking/3DMM/3DMM_info.npy
+data_util/BFM_models/BFM_model_front.mat
+!/data_util/BFM_models/.gitkeep
+deep_3drecon/BFM/Exp_Pca.bin
+deep_3drecon/BFM/01_MorphableModel.mat
+deep_3drecon/BFM/BFM_model_front.mat
+deep_3drecon/network/FaceReconModel.pb
+.vscode
+### Project ignore
+/checkpoints/*
+!/checkpoints/.gitkeep
+/data/*
+!/data/.gitkeep
+!data/raw/videos/May.mp4
+!data/raw/val_wavs/zozo.wav
+infer_out
+rsync
+.idea
+.DS_Store
+bak
+tmp
+*.tar.gz
+mos
+nbs
+/configs_usr/*
+!/configs_usr/.gitkeep
+/egs_usr/*
+!/egs_usr/.gitkeep
+/rnnoise
+#/usr/*
+#!/usr/.gitkeep
+scripts_usr
+
+# Created by .ignore support plugin (hsz.mobi)
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
diff --git a/README-zh.md b/README-zh.md
@@ -0,0 +1,89 @@
+# GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23
+
+[![arXiv](https://img.shields.io/badge/arXiv-Paper-%3CCOLOR%3E.svg)](https://arxiv.org/abs/2301.13430)| [![GitHub Stars](https://img.shields.io/github/stars/yerfor/GeneFace)](https://github.com/yerfor/SyntaSpeech) | ![visitors](https://visitor-badge.glitch.me/badge?page_id=yerfor/GeneFace)
+
+这个仓库是我们[ICLR-2023论文](https://arxiv.org/abs/2301.13430)的官方PyTorch实现，我们在其中提出了**GeneFace** 算法，用于通用和高保真的音频驱动的虚拟人视频合成。
+
+<p align="center">
+    <br>
+    <img src="assets/GeneFace.png" width="1000"/>
+    <br>
+</p>
+
+我们的GeneFace对域外音频（如不同说话人、不同语种的音频）实现了更好的嘴唇同步和表现力。推荐您观看[此视频](https://geneface.github.io/GeneFace/example_show_improvement.mp4)，以了解GeneFace与之前基于NeRF的虚拟人合成方法的口型同步能力对比。您也可以访问我们的[项目页面](https://geneface.github.io/)以了解更多详细信息。
+
+## Quick Started!
+
+我们提供[预训练的GeneFace模型](https://drive.google.com/drive/folders/1L87ZuvC3BOPdWZ7fALdUKYcIt4pWXtDz?usp=share_link)，以便您能快速上手。如果您想在您自己的目标人物视频上训练GeneFace，请遵循 `docs/prepare_env`、`docs/process_data` 、`docs/train_models` 中的步骤。
+
+步骤1：我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的Audio2motion模型(上图中的Variational Motion Generator)，您可以下载它并将其放在 `checkpoints/lrs3/lm3d_vae` 。
+
+步骤2：我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的Post-net (上图中的Domain Adaptative Post-net )，这个模型在 ` data/raw/videos/May.mp4` 上预训练。 您可以下载它并将其放在  `checkpoints/May/postnet` 。
+
+Step3. 我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的NeRF (上图中的3DMM NeRF Renderer) ，这个模型在 ` data/raw/videos/May.mp4` 上预训练。您可以下载它并将其放在 `checkpoints/May/lm3d_nerf` and `checkpoints/May/lm3d_nerf_torso` 。
+
+做完上面的步骤后，您的 `checkpoints`文件夹的结构应该是这样的：
+
+```
+> checkpoints
+    > lrs3
+        > lm3d_vae
+        > syncnet
+    > May
+        > postnet
+        > lm3d_nerf
+        > lm3d_nerf_torso
+  
+```
+
+Step4. 在终端中执行以下命令：
+
+```
+bash scripts/infer_postnet.sh
+bash scripts/infer_lm3d_nerf.sh
+```
+
+你能在以下路径找到输出的视频 `infer_out/May/pred_video/zozo.mp4`.
+
+## 搭建环境
+
+请参照该文件夹中的步骤 `docs/prepare_env`.
+
+## 准备数据
+
+请参照该文件夹中的步骤 `docs/process_data`.
+
+## 训练模型
+
+请参照该文件夹中的步骤 `docs/train_models`.
+
+# 在其他目标人物视频上训练GeneFace
+
+除了本仓库中提供的 `May.mp4`，我们还提供了8个实验中使用的目标人物视频。你可以从[这个链接](https://drive.google.com/drive/folders/1FwQoBd1ZrBJMrJE3ZzlNhK8xAe1OYGjX?usp=share_link)下载。
+
+要训练一个名为 <`video_id>.mp4`的新视频，你应该把它放在 `data/raw/videos/`目录下，然后在 `egs/datasets/videos/<video_id>`目录下创建一个新文件夹，并根据提供的示例文件夹 `egs/datasets/videos/May`添加对应的yaml配置文件。
+
+除了使用我们提供的视频进行训练完，您还可以自己录制视频，为自己训练一个独一无二的GeneFace虚拟人模型！
+
+# 待办事项
+
+GeneFace使用3D人脸关键点作为语音转动作模块和运动转图像模块之间的中介。但是，由Post-net生成的3D人脸关键点序列有时会出现不好的情况（如时序上的抖动，或超大的嘴巴），进而影响NeRF渲染的视频质量。目前，我们通过对预测的人脸关键点序列进行后处理，部分缓解了这一问题。但是目前的后处理方法还是略显简易，不能完美解决所有bad case。因此我们鼓励大家提出更好的后处理方法。
+
+## 引用我们的论文
+
+```
+@article{ye2023geneface,
+  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
+  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
+  journal={arXiv preprint arXiv:2301.13430},
+  year={2023}
+}
+```
+
+## 致谢
+
+本工作受到以下仓库的影响：
+
+* [NATSpeech](https://github.com/NATSpeech/NATSpeech) (参考了其中的代码框架)
+* [AD-NeRF](https://github.com/YudongGuo/AD-NeRF) (参考了NeRF相关的代码实现)
+* [style_avatar](https://github.com/wuhaozhe/style_avatar) (参考了3DMM相关的代码实现)
diff --git a/README.md b/README.md
@@ -0,0 +1,88 @@
+# GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23
+
+#### Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance
+
+[![arXiv](https://img.shields.io/badge/arXiv-Paper-%3CCOLOR%3E.svg)](https://arxiv.org/abs/2301.13430)| [![GitHub Stars](https://img.shields.io/github/stars/yerfor/GeneFace)](https://github.com/yerfor/SyntaSpeech) | ![visitors](https://visitor-badge.glitch.me/badge?page_id=yerfor/GeneFace) | [中文文档](README-zh.md)
+
+This repository is the official PyTorch implementation of our [ICLR-2023 paper](https://arxiv.org/abs/2301.13430)\, in which we propose **GeneFace** for generalized and high-fidelity audio-driven talking face generation. The inference pipeline is as follows:
+
+<p align="center">
+    <br>
+    <img src="assets/GeneFace.png" width="1000"/>
+    <br>
+</p>
+
+Our GeneFace achieves better lip synchronization and expressiveness to out-of-domain audios. Watch [this video](https://geneface.github.io/GeneFace/example_show_improvement.mp4) for a clear lip-sync comparison against previous NeRF-based methods. You can also visit our [project page](https://geneface.github.io/) for more details.
+
+## Quick Start!
+
+We provide [pre-trained models](https://drive.google.com/drive/folders/1L87ZuvC3BOPdWZ7fALdUKYcIt4pWXtDz?usp=share_link) of GeneFace to enable a quick start. If you want to train GeneFace on your own target person video, please follow the guided in `docs/prepare_env`、`docs/process_data` 、`docs/train_models` .
+
+Step1. We provide the pre-trained Audio2motion model (Variational Motion Generator in the figure above) at [this link](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link), you can download it and place it into the directory `checkpoints/lrs3/lm3d_vae`
+
+Step2. We provide the pre-trained Post-net (Domain Adaptative Post-net in the figure above) model for ` data/raw/videos/May.mp4` at [this link](https://drive.google.com/drive/folders/1prLZYmyiMoCeuaBYdTJwFArQbHb_70O5?usp=share_link), you can download it and place it into the directory  `checkpoints/May/postnet`
+
+Step3. We provide the pre-trained NeRF (3DMM NeRF Renderer in the figure above) model for ` data/raw/videos/May.mp4` at [this link](https://drive.google.com/drive/folders/1k-uk3Vya1esqozTM_PjntfYGXnqv7VCs?usp=share_link), you can download it and place it into the directory  `checkpoints/May/lm3d_nerf` and `checkpoints/May/lm3d_nerf_torso`
+
+After the above steps, the structure of your `checkpoints` directory should look like this:
+
+```
+> checkpoints
+    > lrs3
+        > lm3d_vae
+        > syncnet
+    > May
+        > postnet
+        > lm3d_nerf
+        > lm3d_nerf_torso
+```
+
+Step4. Run the scripts below:
+
+```
+bash scripts/infer_postnet.sh
+bash scripts/infer_lm3d_nerf.sh
+```
+
+You can find a output video named `infer_out/May/pred_video/zozo.mp4`.
+
+## Prepare Environments
+
+Please follow the steps in `docs/prepare_env`.
+
+## Prepare Datasets
+
+Please follow the steps in `docs/process_data`.
+
+## Train Models
+
+Please follow the steps in `docs/train_models`.
+
+# Train GeneFace on other target person videos
+
+Apart from the `May.mp4` provided in this repo, we also provide 8 target person videos that were used in our experiments. You can download them at [this link](https://drive.google.com/drive/folders/1FwQoBd1ZrBJMrJE3ZzlNhK8xAe1OYGjX?usp=share_link). To train on a new video named `<video_id>.mp4`, you should place it into the `data/raw/videos/` directory, then create a new folder at `egs/datasets/videos/<video_id>` and edit config files, according to the provided example folder `egs/datasets/videos/May`.
+
+You can also record your own video and train a unique GeneFace model for yourself!
+
+# Todo List
+
+GeneFace use 3D landmark as the intermediate between the audio2motion and motion2image mapping. However, the 3D landmark sequence generated by the postnet sometimes have bad cases (such as shaking head, or extra-large mouth) and influence the quality of the rendered video. Currently, we partially alleviate this problem by postprocessing the predicted 3D landmark sequence. We call for better postprocessing methods.
+
+## Citation
+
+```
+@article{ye2023geneface,
+  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
+  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
+  journal={arXiv preprint arXiv:2301.13430},
+  year={2023}
+}
+```
+
+## Acknowledgements
+
+**Our codes are based on the following repos:**
+
+* [NATSpeech](https://github.com/NATSpeech/NATSpeech) (For the code template)
+* [AD-NeRF](https://github.com/YudongGuo/AD-NeRF) (For NeRF-related implementation)
+* [style_avatar](https://github.com/wuhaozhe/style_avatar) (For 3DMM parameters extraction)
diff --git a/assets/GeneFace.png b/assets/GeneFace.png
diff --git a/checkpoints/.gitkeep b/checkpoints/.gitkeep
diff --git a/data/.gitkeep b/data/.gitkeep