Skip to content

A time-domain extension to "Perceptual Contrast Stretching on Target Feature for Speech Enhancement"

Notifications You must be signed in to change notification settings

YinPing-Cho/PCS-FIR-Filter

Repository files navigation

PCS-FIR-Filter

Based on the spectral perceptual gains from the official PCS repo, this project aims to derive the equivalent linear-time-invariant (LTI) finite-impulse-response (FIR) filter coefficients to allow Perceptual Contrast Stretching (PCS) be performed directly on waveforms.

FIR filtering is a differentiable operation, which makes it ideal for Deep Learning applications working directly on waveforms. The FIR filtering example in this project is performed with PyTorch 1-D convolution layer. Of course, the derived filter coefficients (in numpy format) can also be easily applied to other backends.

Requirements

torch >= 1.8
torchaudio
matplotlib
Soundfile
numpy
scipy

Available in requirements.txt

Usage

  1. Filter design:
  • python PCS_coeffs_generate.py --mode='manual' generates FIR filter coefficients (in *.npy format) and impulse response plot under directory generated_freq_response/ with default spectral PCS coefficients.
  • Since the original PCS (spectral PCS) works on log-1-p spectrograms, the nonlinearity cannot be reproduced directly with LTI FIR filters; therefore, python PCS_coeffs_generate.py provides two additional statistical filter design methods to approximate the behavior of spectral PCS:
    • python PCS_coeffs_generate.py --mode='statistical' --stat_mode='gaussian' measures and approximate spectral PCS's equivalent LTI impulse response with Gaussian signals of varying standard deviations.
    • python PCS_coeffs_generate.py --mode='statistical' --stat_mode='wav' --wav_dir='*' measures and approximate spectral PCS's equivalent LTI impulse response with the .wav files you placed in wav_dir.
  1. FIR Filtering with wave-PCS:
  • python test_PCS_wave.py performs wave-PCS with the FIR filter coefficients derived by PCS_coeffs_generate.py and outputs filtered audio.
  1. Quick comparison to spectral PCS:
  • python test_PCS_spectral.py performs spectral PCS with official repo's PCS functions. This snippet is meant for comparing how the FIR wave-PCS's result compares to the original spectral PCS.

Example Results

  • Frequency response of the FIR filter coefficients derived from the default PCS settings with GAIN_SMOOTHING = 0.2:

- Spectra comparison of befer and after PCS:

  • Frequency response of the FIR filter coefficients derived with audio-wav-based statistical method with Mpop600 Mandarin singing voice dataset:

- Spectra comparison of befer and after PCS:

Reference

  • The official repo of PCS (https://github.com/RoyChao19477/PCS).
  • The original PCS paper: Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao, "Perceptual Contrast Stretching on Target Feature for Speech Enhancement," (http://arxiv.org/abs/2203.17152)
  • Mpop600 Mandarin singing voice dataset: C. -C. Chu, F. -R. Yang, Y. -J. Lee, Y. -W. Liu and S. -H. Wu, "MPop600: A Mandarin Popular Song Database with Aligned Audio, Lyrics, and Musical Scores for Singing Voice Synthesis," 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 1647-1652.

About

A time-domain extension to "Perceptual Contrast Stretching on Target Feature for Speech Enhancement"

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages