This repository provides the official PyTorch implementation of ContentVec.
This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.
https://proceedings.mlr.press/v162/qian22b.html
The legacy model only contains the representation module, which may be loaded using plain fairseq installation without setting up this code repo.
Model | Classes | |
---|---|---|
ContentVec_legacy | 100 | download |
ContentVec | 100 | download |
ContentVec_legacy | 500 | download |
ContentVec | 500 | download |
ckpt_path = "/path/to/the/checkpoint_best_legacy.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]
For detailed feature extraction steps, please refer to Hubert.
Download the zip file consisting of the following files:
{train,valid}.tsv
waveform list files in metadata{train,valid}.km
frame-aligned pseudo label files in labelsdict.km.txt
a dummy dictionary in labelsspk2info.dict
a dictionary mapping from speaker id to speaker embedding in metadata
Modify the root directory in the {train,valid}.tsv
waveform list files
Follow steps in setup.sh
to setup the code repo
Use run_pretrain_single.sh
to run on a single node
Use run_pretrain_multi.sh
and the corresponding slurm template to run on multiple GPUs and nodes