ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

This repository provides the official PyTorch implementation of ContentVec.

This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.

Cite this paper

https://proceedings.mlr.press/v162/qian22b.html

Pre-trained models

The legacy model only contains the representation module, which may be loaded using plain fairseq installation without setting up this code repo.

Model	Classes
ContentVec_legacy	100	download
ContentVec	100	download
ContentVec_legacy	500	download
ContentVec	500	download

Load a model

ckpt_path = "/path/to/the/checkpoint_best_legacy.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]

For detailed feature extraction steps, please refer to Hubert.

Train a new model

Data preparation

Download the zip file consisting of the following files:

{train,valid}.tsv waveform list files in metadata
{train,valid}.km frame-aligned pseudo label files in labels
dict.km.txt a dummy dictionary in labels
spk2info.dict a dictionary mapping from speaker id to speaker embedding in metadata

Modify the root directory in the {train,valid}.tsv waveform list files

Setup code repo

Follow steps in setup.sh to setup the code repo

Pretrain ContentVec

Use run_pretrain_single.sh to run on a single node

Use run_pretrain_multi.sh and the corresponding slurm template to run on multiple GPUs and nodes

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
contentvec		contentvec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
contentvec_pretrain.slurm		contentvec_pretrain.slurm
run_pretrain_multi.sh		run_pretrain_multi.sh
run_pretrain_single.sh		run_pretrain_single.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Cite this paper

Pre-trained models

Load a model

Train a new model

Data preparation

Setup code repo

Pretrain ContentVec

About

Releases

Packages

Languages

License

auspicious3000/contentvec

Folders and files

Latest commit

History

Repository files navigation

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

Cite this paper

Pre-trained models

Load a model

Train a new model

Data preparation

Setup code repo

Pretrain ContentVec

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages