Code for vector quantizing speech dataset, including melspectrograms, phonetic-posteriorgrams/bottleneck features(BNFs). This repo trains an independent module to vector quantize BNFs.
For usage in voice conversion, see here
- Install ffmpeg.
- Install Kaldi
- Install PyKaldi
- Install packages using environment.yml file.
- Download pretrained TDNN-F model, extract it, and set
PRETRAIN_ROOT
inkaldi_scripts/extract_features_kaldi.sh
to the pretrained model directory.
- Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set
KALDI_ROOT
andPRETRAIN_ROOT
inkaldi_scripts/extract_features_kaldi.sh
accordingly.
- You also need to set
- Vector Quantization: [ARCTIC and L2-ARCTIC, see here for detailed training process.
All the pretrained the models are available (To be updated) here
datatset_root
├── speaker 1
├── speaker 2
│ ├── wav # contains all the wav files from speaker 2
│ └── kaldi # Kaldi files (auto-generated after running kaldi-scripts
.
.
└── speaker N
- Use Kaldi to extract BNF for individual speakers (Do it for all speakers)
./kaldi_scripts/extract_features_kaldi.sh /path/to/speaker
- Preprocessing
python preprocess_bnfs.py path/to/dataset
python python make_data_all.py #Edit the file to specify dataset path
-
Setting Training params. See conf/
-
Training VQ Model
./train.sh