Skip to content

PyTorch implementation of: Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Notifications You must be signed in to change notification settings

acetylSv/non-parallel-rhythm-flexible-VC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Non-parallel-rhythm-flexible-VC

PyTorch implementation of: Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

  • This repo is NOT completed yet
  • This repo is NOT completed yet
  • This repo is NOT completed yet
  • Please new issues if you find something werid or not working, thanks!

Samples

Samples could be found here, the corresponding experiment is specified at section 5.3 in the paper. Only conventional and proposed methods are compared here.

Python and Toolkit Version

Python:   '3.5.2'
Numpy:    '1.16.2'
PyTorch:  '0.4.1'
Montreal-force-aligner: '1.1.0'

Data Preprocess (Frame-level phoneme boundary segmentation included)

  1. Download and decompress VCTK corpus
  2. Put text file and audio file under same dir, run rename.sh
  3. Run align_VCTK.sh to get aligned result
  4. Set path info in config/config.yaml
  5. Run preprocess.py to generate acoustic features with corresponding phone label

Configuration and Usage

  1. All hyperparameters are listed in this .yaml file
  2. All modules training could be done by calling the main.py by adding different arguments.

usage: main.py [-h] [--config CONFIG] 
               [--seed SEED] [--train | --test]
               [--ppr | --ppts | --uppt] 
               [--spk_id SPK_ID] [--A_id A_ID] [--B_id B_ID] 
               [--pre_train]
  1. The detailed usages of each module are listed below.
  2. The path of logging and model saving should be specified in config file first.

PPR

Example script

Training

python3 main.py --config [path-to-config] --train --ppr

Evaluation

python3 main.py --config [path-to-config] --test --ppr

PPTS

Example script

Training

python3 main.py --config [path-to-config] --train --ppts \\
                --spk_id [which-speaker-to-train]

Evaluation

python3 main.py --config [path-to-config] --test --ppts \\
                --spk_id [which-speaker-to-train]

UPPT(CycleGAN ver.)

Example script

AE Pre-Training

python3 main.py --config [path-to-config] --train --uppt \\
    --pre_train --A_id [src-speaker] --B_id [tgt-speaker]
  • If A_id and B_id are both set to "all", then data of two groups of fast and slow speakers instead of two single speaker will be used instead for pre-training.
  • Ex.
     ... --A_id all --B_id all

Training

python3 main.py --config [path-to-config] --train --uppt \\
    --A_id [src-speaker] --B_id [tgt-speaker]

Evaluation

python3 main.py --config [path-to-config] --test --uppt \\
    --A_id [src-speaker] --B_id [tgt-speaker]

UPPT(StarGAN ver.)

Example script

AE Pre-Training

python3 star_main.py 
--config [path-to-config] --train --uppt --pre_train

Training

python3 star_main.py --config [path-to-config] --train --uppt

Evaluation

python3 star_main.py --config [path-to-config] --test --uppt \\
    --tgt_id [tgt-speaker]

Notes

  1. Phoneme 'spn' means Unknown in MFA, so currently map it with 'sp' to id 0 as well.
  2. Is padding 'sp' a good choice? Or maybe 'sil'?

TODO

  • Add Logging method to solver, removing add summ redundancy in both train and eval
  • Whole conversion process pipeline, adding functions to load from specified path at inference time
  • StarGAN inference

About

PyTorch implementation of: Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published