Skip to content

232136813/contentvec

Repository files navigation

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

This repository provides the official PyTorch implementation of ContentVec.

This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.

ContentVec

Pre-trained models (There are issues with the download link, we will fix it ASAP. For now, please send emails to request pretrained models.)

Model Classes
ContentVec_legacy 100 download
ContentVec 100 download
ContentVec_legacy 500 download
ContentVec 500 download

Load a model without setting up code repo

ckpt_path = "/path/to/the/checkpoint_best_legacy.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]

Train a new model

Data preparation

Download the zip file consisting of the following files:

  • {train,valid}.tsv waveform list files in metadata
  • {train,valid}.km frame-aligned pseudo label files in labels
  • dict.km.txt a dummy dictionary in labels
  • spk2info.dict a dictionary mapping from speaker id to speaker embedding in metadata

Modify the root directory in the {train,valid}.tsv waveform list files

Setup code repo

Follow steps in setup.sh to setup the code repo

Pretrain ContentVec

Use run_pretrain_single.sh to run on a single node

Use run_pretrain_multi.sh and the corresponding slurm template to run on multiple GPUs and nodes

About

speech self-supervised representations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.5%
  • Shell 4.5%