Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
YuanxunLu authored Sep 23, 2021
1 parent 194b424 commit eb6a56d
Show file tree
Hide file tree
Showing 37 changed files with 5,065 additions and 2 deletions.
36 changes: 34 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This repository contains the implementation of the following paper:
>
> **Abstract**: To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.
>
> [[Project Page]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/) [[Paper]](./doc/SIGGSIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf)
> [[Project Page]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/) [[Paper]](https://yuanxunlu.github.io/projects/LiveSpeechPortraits/resources/SIGGRAPH_Asia_2021__Live_Speech_Portraits__Real_Time_Photorealistic_Talking_Head_Animation.pdf) [[Arxiv]](https://arxiv.org/abs/2109.10595)
![Teaser](./doc/Teaser.jpg)

Expand All @@ -18,13 +18,45 @@ Figure 1. Given an arbitrary input audio stream, our system generates personaliz

## Requirements

- This project is successfully trained and tested on Windows10 with PyTorch 1.7 (Python 3.6). Linux and lower version PyTorch should also work (not tested). We recommend creating a new environment:

```
conda create -n LSP python=3.6
conda activate LSP
```

- Clone the repository:

```
git clone https://github.com/YuanxunLu/LiveSpeechPortraits.git
cd LiveSpeechPortraits
```

- ffmpeg is required to combine the audio and the silent generated videos). Please check [FFmpeg](http://ffmpeg.org/download.html) for installation. For Linux users, you can also:

```
sudo apt-get install ffmpeg
```

- Install the dependences:

```
pip install -r requirements.txt
```



## Demo

- Download the pre-trained models and data from [Google Drive]() to the folder `data`. Five subjects data are released (May, Obama1, Obama2, Nadella and McStay).

## Usage
- Run the demo:

```
python demo.py --id May --driving_audio ./data/input/00083.wav
```

Results can be found under the `results` folder.



Expand Down
32 changes: 32 additions & 0 deletions config/May.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model_params:
APC:
ckp_path: './data/APC_epoch_160.model'
mel_dim: 80
hidden_size: 512
num_layers: 3
residual: false
use_LLE: 1
Knear: 10
LLE_percent: 1
Audio2Mouth:
ckp_path: './data/May/checkpoints/Audio2Feature.pkl'
smooth: 1.5
AMP: ['XYZ', 2, 2, 2] # method, x, y, z
Headpose:
ckp_path: './data/May/checkpoints/Audio2Headpose.pkl'
sigma: 0.3
smooth: [5, 10] # rot, trans
AMP: [1, 0.5] # rot, trans
shoulder_AMP: 0.5
Image2Image:
ckp_path: './data/May/checkpoints/Feature2Face.pkl'
size: 'large'
save_input: 1


dataset_params:
root: './data/May/'
fit_data_path: './data/May/3d_fit_data.npz'
pts3d_path: './data/May/tracked3D_normalized_pts_fix_contour.npy'


32 changes: 32 additions & 0 deletions config/McStay.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model_params:
APC:
ckp_path: './data/APC_epoch_160.model'
mel_dim: 80
hidden_size: 512
num_layers: 3
residual: false
use_LLE: 1
Knear: 10
LLE_percent: 1
Audio2Mouth:
ckp_path: './data/McStay/checkpoints/Audio2Feature.pkl'
smooth: 2
AMP: ['XYZ', 1.5, 1.5, 1.5] # method, x, y, z
Headpose:
ckp_path: './data/McStay/checkpoints/Audio2Headpose.pkl'
sigma: 0.3
smooth: [5, 10] # rot, trans
AMP: [1, 1] # rot, trans
shoulder_AMP: 0.5
Image2Image:
ckp_path: './data/McStay/checkpoints/Feature2Face.pkl'
size: 'normal'
save_input: 1


dataset_params:
root: './data/McStay/'
fit_data_path: './data/McStay/3d_fit_data.npz'
pts3d_path: './data/McStay/tracked3D_normalized_pts_fix_contour.npy'


32 changes: 32 additions & 0 deletions config/Nadella.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model_params:
APC:
ckp_path: './data/APC_epoch_160.model'
mel_dim: 80
hidden_size: 512
num_layers: 3
residual: false
use_LLE: 1
Knear: 10
LLE_percent: 1
Audio2Mouth:
ckp_path: './data/Nadella/checkpoints/Audio2Feature.pkl'
smooth: 2
AMP: ['XYZ', 1.5, 1.5, 1.5] # method, x, y, z
Headpose:
ckp_path: './data/Nadella/checkpoints/Audio2Headpose.pkl'
sigma: 0.3
smooth: [5, 10] # rot, trans
AMP: [0.5, 0.5] # rot, trans
shoulder_AMP: 0.5
Image2Image:
ckp_path: './data/Nadella/checkpoints/Feature2Face.pkl'
size: 'normal'
save_input: 1


dataset_params:
root: './data/Nadella/'
fit_data_path: './data/Nadella/3d_fit_data.npz'
pts3d_path: './data/Nadella/tracked3D_normalized_pts_fix_contour.npy'


32 changes: 32 additions & 0 deletions config/Obama1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model_params:
APC:
ckp_path: './data/APC_epoch_160.model'
mel_dim: 80
hidden_size: 512
num_layers: 3
residual: false
use_LLE: 1
Knear: 10
LLE_percent: 1
Audio2Mouth:
ckp_path: './data/Obama1/checkpoints/Audio2Feature.pkl'
smooth: 1
AMP: ['XYZ', 1.5, 1.5, 1.5] # method, x, y, z
Headpose:
ckp_path: './data/Obama1/checkpoints/Audio2Headpose.pkl'
sigma: 0.3
smooth: [2, 8] # rot, trans
AMP: [1, 1] # rot, trans
shoulder_AMP: 0.5
Image2Image:
ckp_path: './data/Obama1/checkpoints/Feature2Face.pkl'
size: 'normal'
save_input: 1


dataset_params:
root: './data/Obama1/'
fit_data_path: './data/Obama1/3d_fit_data.npz'
pts3d_path: './data/Obama1/tracked3D_normalized_pts_fix_contour.npy'


32 changes: 32 additions & 0 deletions config/Obama2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
model_params:
APC:
ckp_path: './data/APC_epoch_160.model'
mel_dim: 80
hidden_size: 512
num_layers: 3
residual: false
use_LLE: 1
Knear: 10
LLE_percent: 1
Audio2Mouth:
ckp_path: './data/Obama2/checkpoints/Audio2Feature.pkl'
smooth: 2
AMP: ['XYZ', 1.5, 1.5, 1.5] # method, x, y, z
Headpose:
ckp_path: './data/Obama2/checkpoints/Audio2Headpose.pkl'
sigma: 0.3
smooth: [3, 10] # rot, trans
AMP: [1, 1] # rot, trans
shoulder_AMP: 0.5
Image2Image:
ckp_path: './data/Obama2/checkpoints/Feature2Face.pkl'
size: 'normal'
save_input: 1


dataset_params:
root: './data/Obama2/'
fit_data_path: './data/Obama2/3d_fit_data.npz'
pts3d_path: './data/Obama2/tracked3D_normalized_pts_fix_contour.npy'


Loading

0 comments on commit eb6a56d

Please sign in to comment.