Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add USES model for speech enhancement in diverse conditions #5482

Merged
merged 19 commits into from
Nov 1, 2023

Conversation

Emrys365
Copy link
Collaborator

What?

This PR adds a new speech enhancement model USES, which is capable of handling diverse input conditions with a single model:

  • different sampling frequencies
  • different microphone variations (1 or more channels with different array geometries)
  • different signal lengths (with approximately linear complexity)

Along with the USES model implementation, I also update the related functions that are required to train this new model, including:

  • espnet2/iterators/chunk_iter_factory.py: A new argument default_fs is added to ChunkIterFactory to allow adaptive adjustment of chunk_length according to the sampling frequency of the batch. The related argument is also added to espnet2/tasks/abs_task.py.
  • espnet2/train/preprocessor.py: Add the sampling frequency information in EnhPreprocessor. Also improve some existing functions to more flexibility.
  • espnet2/enh/encoder/stft_encoder.py and espnet2/enh/decoder/stft_decoder.py: Add two methods reset_config and reconfig_for_fs to allow adaptive adjustment of n_fft, win_length, and hop_length according to the sampling frequency of the input signal.
  • espnet2/enh/loss/criterions/time_domain.py: Update MultiResL1SpecLoss to support different reduction modes and variance normalization before loss calculation.
  • espnet2/enh/layers/dptnet.py: Add a new option ("linear") for the argument activation, which will be used in the new model.
  • espnet2/enh/layers/uses.py and espnet2/enh/separator/uses_separator.py: Implementation of the new model.
  • espnet2/enh/espnet_model.py
    • Add options for variance normalization (and de-normalization) before (after) model forward.
    • Add category-related processing in the model forward method.
    • Add sampling frequency related processing in the model forward method.
    • Add category-specific logging for training, i.e., losses from batches of different categories are reported with different suffixes.
  • espnet2/torch_utils/recursive_op.py: Modify the way data from different nodes are accumulated in the DDP mode to be compatible with the category-specific logging in espnet2/enh/espnet_model.py.
  • espnet2/bin/enh_inference.py: Add category-related processing and variance normalization/de-normalization.

Why?

This new SE model and the related function updates can increase the flexibility of the current SE framework.

See also

Pre-trained model is available at HuggingFace.

@Emrys365 Emrys365 added New Features ESPnet2 SE Speech enhancement labels Oct 18, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 18, 2023

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Oct 18, 2023
@mergify mergify bot removed the conflicts label Oct 18, 2023
@Emrys365 Emrys365 changed the title Espnet2 enh Add USES model for speech enhancement in diverse conditions Oct 18, 2023
@codecov
Copy link

codecov bot commented Oct 18, 2023

Codecov Report

Merging #5482 (389b9cd) into master (d57ff4e) will increase coverage by 0.02%.
Report is 2 commits behind head on master.
The diff coverage is 80.34%.

@@            Coverage Diff             @@
##           master    #5482      +/-   ##
==========================================
+ Coverage   75.40%   75.43%   +0.02%     
==========================================
  Files         709      711       +2     
  Lines       65361    65757     +396     
==========================================
+ Hits        49288    49606     +318     
- Misses      16073    16151      +78     
Flag Coverage Δ
test_configuration_espnet2 ∅ <ø> (∅)
test_integration_espnet1 65.67% <ø> (ø)
test_integration_espnet2 48.51% <38.24%> (-0.17%) ⬇️
test_python_espnet1 19.03% <0.00%> (-0.12%) ⬇️
test_python_espnet2 51.61% <68.16%> (+0.13%) ⬆️
test_utils 23.10% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
espnet2/enh/decoder/abs_decoder.py 100.00% <ø> (ø)
espnet2/enh/decoder/conv_decoder.py 100.00% <100.00%> (ø)
espnet2/enh/decoder/null_decoder.py 100.00% <100.00%> (ø)
espnet2/enh/decoder/stft_decoder.py 97.40% <100.00%> (+0.43%) ⬆️
espnet2/enh/encoder/abs_encoder.py 87.50% <ø> (ø)
espnet2/enh/encoder/conv_encoder.py 100.00% <100.00%> (ø)
espnet2/enh/encoder/null_encoder.py 90.00% <100.00%> (ø)
espnet2/enh/encoder/stft_encoder.py 98.59% <100.00%> (+0.28%) ⬆️
espnet2/enh/layers/dptnet.py 100.00% <100.00%> (ø)
espnet2/enh/layers/tcn.py 95.67% <100.00%> (+0.04%) ⬆️
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added this to the v.202312 milestone Oct 18, 2023
@LiChenda
Copy link
Contributor

I'll review it.

@sw005320
Copy link
Contributor

@kohei0209, please review this PR.

@kohei0209
Copy link
Contributor

Sure, I'll review this PR.

@kan-bayashi kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023
@sw005320
Copy link
Contributor

@kohei0209 and @LiChenda, can you review this PR?

Copy link
Contributor

@kohei0209 kohei0209 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emrys365, I left several comments. Could you check them?

espnet2/enh/espnet_model.py Outdated Show resolved Hide resolved
espnet2/enh/layers/dptnet.py Outdated Show resolved Hide resolved
espnet2/train/preprocessor.py Outdated Show resolved Hide resolved
espnet2/train/preprocessor.py Outdated Show resolved Hide resolved
Copy link
Contributor

@LiChenda LiChenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @Emrys365 , I updated some comments for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some new code blocks are not covered by the test. Could you also update the unit test?

espnet2/enh/espnet_model.py Show resolved Hide resolved
espnet2/enh/encoder/stft_encoder.py Outdated Show resolved Hide resolved
espnet2/enh/decoder/stft_decoder.py Outdated Show resolved Hide resolved
espnet2/enh/espnet_model.py Outdated Show resolved Hide resolved
espnet2/enh/espnet_model.py Outdated Show resolved Hide resolved
Copy link
Contributor

@LiChenda LiChenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Emrys365 ! Now, it looks good to me!

Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -220,6 +220,37 @@ def __call__(
speech_mix = to_device(speech_mix, device=self.device)
lengths = to_device(lengths, device=self.device)

###################################
# Normalize the signal variance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These normalizations look good, but be careful when we use online (streaming) manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note.

@sw005320
Copy link
Contributor

Copy link
Contributor

@kohei0209 kohei0209 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emrys365 Looks good to me, thank you for reflecting the comments!

@sw005320 sw005320 merged commit e349294 into espnet:master Nov 1, 2023
24 checks passed
@sw005320
Copy link
Contributor

sw005320 commented Nov 1, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants