Skip to content

We implemented the DEMUCS model for speech enhancement in the time-frequency domain, and additionally implemented HD-DEMUCS.

License

Notifications You must be signed in to change notification settings

SherryYu33/DEMUCS-for-Speech-Enhancement

 
 

Repository files navigation

DEMUCS-for-Speech-Enhancement

Welcome to the DEMUCS-for-Speech-Enhancement repository.

DEMUCS is a source separation model proposed by Facebook (now META), which received great attention for its fast processing speed and excellent performance. It was later applied to the field of speech enhancement and showed excellent performance [1]. This repository provides the following research content:

  1. Implementation of HD-DEMUCS[2]
  2. DEMUCS in the time-frequency domain
  3. HD-DEMUCS in time-frequency domain

Performance is provided at the end of the README, and as a result, you can check the performance comparison in HD-DEMUCS and the Time-frequency domain.

Update

  • 2023.11.06

Requirements

This repo is implemented in Ubuntu 22.04, PyTorch 2.0.1, Python3.10, and CUDA11.7. For package dependencies, you can install them by:

pip install -r requirements.txt    

Dataset Installation

To get started with the DEMUCS-for-Speech-Enhancement project, the first step is to set up the dataset which will be used to train and evaluate the model. This project uses a combination of the Voice Bank corpus and DEMAND database

Voice Bank + DEMAND Dataset: The dataset combines clean speech from the Voice Bank corpus and various types of noise from the DEMAND database to simulate realistic noisy speech conditions.

Download: https://datashare.ed.ac.uk/handle/10283/1942

Getting Started

  1. Install the necessary libraries.
  2. Set directory paths for your dataset. (options.py)
# dataset path
noisy_dirs_for_train = '../Dataset/train/noisy/'   
noisy_dirs_for_valid = '../Dataset/valid/noisy/'   
  1. Run train_interface.py

Architecture

Results

References

[1] Defossez, Alexandre, Gabriel Synnaeve, and Yossi Adi. "Real time speech enhancement in the waveform domain." arXiv preprint arXiv:2006.12847 (2020). [paper] [code]
[2] Kim, Doyeon, et al. "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders." arXiv preprint arXiv:2306.01411 (2023). [paper]

Contact

E-mail: jbcha7@yonsei.ac.kr

About

We implemented the DEMUCS model for speech enhancement in the time-frequency domain, and additionally implemented HD-DEMUCS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%