Add USES model for speech enhancement in diverse conditions #5482

Emrys365 · 2023-10-18T02:17:44Z

What?

This PR adds a new speech enhancement model USES, which is capable of handling diverse input conditions with a single model:

different sampling frequencies
different microphone variations (1 or more channels with different array geometries)
different signal lengths (with approximately linear complexity)

Along with the USES model implementation, I also update the related functions that are required to train this new model, including:

espnet2/iterators/chunk_iter_factory.py: A new argument default_fs is added to ChunkIterFactory to allow adaptive adjustment of chunk_length according to the sampling frequency of the batch. The related argument is also added to espnet2/tasks/abs_task.py.
espnet2/train/preprocessor.py: Add the sampling frequency information in EnhPreprocessor. Also improve some existing functions to more flexibility.
espnet2/enh/encoder/stft_encoder.py and espnet2/enh/decoder/stft_decoder.py: Add two methods reset_config and reconfig_for_fs to allow adaptive adjustment of n_fft, win_length, and hop_length according to the sampling frequency of the input signal.
espnet2/enh/loss/criterions/time_domain.py: Update MultiResL1SpecLoss to support different reduction modes and variance normalization before loss calculation.
espnet2/enh/layers/dptnet.py: Add a new option ("linear") for the argument activation, which will be used in the new model.
espnet2/enh/layers/uses.py and espnet2/enh/separator/uses_separator.py: Implementation of the new model.
espnet2/enh/espnet_model.py
- Add options for variance normalization (and de-normalization) before (after) model forward.
- Add category-related processing in the model forward method.
- Add sampling frequency related processing in the model forward method.
- Add category-specific logging for training, i.e., losses from batches of different categories are reported with different suffixes.
espnet2/torch_utils/recursive_op.py: Modify the way data from different nodes are accumulated in the DDP mode to be compatible with the category-specific logging in espnet2/enh/espnet_model.py.
espnet2/bin/enh_inference.py: Add category-related processing and variance normalization/de-normalization.

Why?

This new SE model and the related function updates can increase the flexibility of the current SE framework.

Codecov Report

Merging #5482 (389b9cd) into master (d57ff4e) will increase coverage by 0.02%.
Report is 2 commits behind head on master.
The diff coverage is 80.34%.

@@            Coverage Diff             @@
##           master    #5482      +/-   ##
==========================================
+ Coverage   75.40%   75.43%   +0.02%     
==========================================
  Files         709      711       +2     
  Lines       65361    65757     +396     
==========================================
+ Hits        49288    49606     +318     
- Misses      16073    16151      +78

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`65.67% <ø> (ø)`
test_integration_espnet2	`48.51% <38.24%> (-0.17%)`	⬇️
test_python_espnet1	`19.03% <0.00%> (-0.12%)`	⬇️
test_python_espnet2	`51.61% <68.16%> (+0.13%)`	⬆️
test_utils	`23.10% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
espnet2/enh/decoder/abs_decoder.py	`100.00% <ø> (ø)`
espnet2/enh/decoder/conv_decoder.py	`100.00% <100.00%> (ø)`
espnet2/enh/decoder/null_decoder.py	`100.00% <100.00%> (ø)`
espnet2/enh/decoder/stft_decoder.py	`97.40% <100.00%> (+0.43%)`	⬆️
espnet2/enh/encoder/abs_encoder.py	`87.50% <ø> (ø)`
espnet2/enh/encoder/conv_encoder.py	`100.00% <100.00%> (ø)`
espnet2/enh/encoder/null_encoder.py	`90.00% <100.00%> (ø)`
espnet2/enh/encoder/stft_encoder.py	`98.59% <100.00%> (+0.28%)`	⬆️
espnet2/enh/layers/dptnet.py	`100.00% <100.00%> (ø)`
espnet2/enh/layers/tcn.py	`95.67% <100.00%> (+0.04%)`	⬆️
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

for more information, see https://pre-commit.ci

LiChenda · 2023-10-18T12:07:16Z

I'll review it.

…hannelwiseLayerNorm

for more information, see https://pre-commit.ci

sw005320 · 2023-10-21T11:14:02Z

@kohei0209, please review this PR.

kohei0209 · 2023-10-22T12:58:08Z

Sure, I'll review this PR.

sw005320 · 2023-10-27T14:16:11Z

@kohei0209 and @LiChenda, can you review this PR?

kohei0209

@Emrys365, I left several comments. Could you check them?

espnet2/enh/espnet_model.py

espnet2/enh/layers/dptnet.py

espnet2/train/preprocessor.py

LiChenda

Hi, @Emrys365 , I updated some comments for this PR.

LiChenda · 2023-10-30T06:20:56Z

espnet2/bin/enh_inference.py

It looks like some new code blocks are not covered by the test. Could you also update the unit test?

espnet2/enh/espnet_model.py

espnet2/enh/encoder/stft_encoder.py

espnet2/enh/decoder/stft_decoder.py

espnet2/enh/espnet_model.py

LiChenda

Thank you @Emrys365 ! Now, it looks good to me!

sw005320

LGTM

sw005320 · 2023-10-31T11:30:29Z

espnet2/bin/enh_inference.py

@@ -220,6 +220,37 @@ def __call__(
        speech_mix = to_device(speech_mix, device=self.device)
        lengths = to_device(lengths, device=self.device)

+        ###################################
+        # Normalize the signal variance


These normalizations look good, but be careful when we use online (streaming) manner.

Just a note.

sw005320 · 2023-10-31T17:56:27Z

Please fix the CI error
https://github.com/espnet/espnet/actions/runs/6698386172/job/18202217083?pr=5482#step:16:2840

kohei0209

@Emrys365 Looks good to me, thank you for reflecting the comments!

sw005320 · 2023-11-01T12:10:46Z

Thanks!

Emrys365 added 2 commits October 17, 2023 21:54

Add USES model for speech enhancement in diverse conditions

76d2353

Add USES model for speech enhancement in diverse conditions

8914285

Emrys365 added New Features ESPnet2 SE Speech enhancement labels Oct 18, 2023

mergify bot added the conflicts label Oct 18, 2023

Resolve conflicts

df3fa33

mergify bot removed the conflicts label Oct 18, 2023

Emrys365 changed the title ~~Espnet2 enh~~ Add USES model for speech enhancement in diverse conditions Oct 18, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

283ba85

for more information, see https://pre-commit.ci

Emrys365 and others added 2 commits October 17, 2023 23:37

Fix unit test errors

92467ab

[pre-commit.ci] auto fixes from pre-commit.com hooks

ec28a22

for more information, see https://pre-commit.ci

sw005320 added this to the v.202312 milestone Oct 18, 2023

Emrys365 assigned LiChenda Oct 18, 2023

Emrys365 and others added 6 commits October 18, 2023 11:32

Improve the unit test speed for test/espnet2/layers/test_augmentation.py

2cc5c52

Set @torch.cuda.amp.autocast(enabled=False) for GlobalLayerNorm and C…

e6ad242

…hannelwiseLayerNorm

Update MultiResL1SpecLoss

0e29817

Update espnet2/enh/espnet_model.py to handle multi-condition data

6d706d6

[pre-commit.ci] auto fixes from pre-commit.com hooks

1f013bf

for more information, see https://pre-commit.ci

Update espnet2/enh/espnet_model.py to handle multi-condition data

f4e9ebd

Emrys365 mentioned this pull request Oct 18, 2023

Add a new SE recipe combining five public corpora #5484

Merged

Emrys365 added 3 commits October 19, 2023 15:57

Add category information related processing in EnhPreprocessor

2279867

Add category information related processing in EnhPreprocessor

92b9405

Fix a bug when speech_mix has more than 1 channel

1962a5e

Emrys365 assigned kohei0209 Oct 22, 2023

kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023

kohei0209 reviewed Oct 29, 2023

View reviewed changes

espnet2/enh/espnet_model.py Outdated Show resolved Hide resolved

espnet2/enh/layers/dptnet.py Outdated Show resolved Hide resolved

espnet2/train/preprocessor.py Outdated Show resolved Hide resolved

espnet2/train/preprocessor.py Outdated Show resolved Hide resolved

LiChenda reviewed Oct 30, 2023

View reviewed changes

Emrys365 added 2 commits October 30, 2023 15:44

Reflect comments

3a73f5b

Fix unit test errors

92ef5c8

LiChenda approved these changes Oct 31, 2023

View reviewed changes

sw005320 approved these changes Oct 31, 2023

View reviewed changes

Emrys365 added 2 commits October 31, 2023 17:22

Merge branch 'master' of github.com:espnet/espnet into espnet2_enh

4318b7e

Fix error in integration test

389b9cd

kohei0209 approved these changes Nov 1, 2023

View reviewed changes

sw005320 merged commit e349294 into espnet:master Nov 1, 2023
24 checks passed

Emrys365 mentioned this pull request Apr 24, 2024

Add implementations of USES2 speech enhancement models #5761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add USES model for speech enhancement in diverse conditions #5482

Add USES model for speech enhancement in diverse conditions #5482

Emrys365 commented Oct 18, 2023

mergify bot commented Oct 18, 2023

codecov bot commented Oct 18, 2023 •

edited

Loading

LiChenda commented Oct 18, 2023

sw005320 commented Oct 21, 2023

kohei0209 commented Oct 22, 2023

sw005320 commented Oct 27, 2023

kohei0209 left a comment •

edited

Loading

LiChenda left a comment

LiChenda Oct 30, 2023

LiChenda left a comment

sw005320 left a comment

sw005320 Oct 31, 2023

sw005320 Oct 31, 2023

sw005320 commented Oct 31, 2023

kohei0209 left a comment

sw005320 commented Nov 1, 2023

Add USES model for speech enhancement in diverse conditions #5482

Add USES model for speech enhancement in diverse conditions #5482

Conversation

Emrys365 commented Oct 18, 2023

What?

Why?

See also

mergify bot commented Oct 18, 2023

codecov bot commented Oct 18, 2023 • edited Loading

Codecov Report

LiChenda commented Oct 18, 2023

sw005320 commented Oct 21, 2023

kohei0209 commented Oct 22, 2023

sw005320 commented Oct 27, 2023

kohei0209 left a comment • edited Loading

Choose a reason for hiding this comment

LiChenda left a comment

Choose a reason for hiding this comment

LiChenda Oct 30, 2023

Choose a reason for hiding this comment

LiChenda left a comment

Choose a reason for hiding this comment

sw005320 left a comment

Choose a reason for hiding this comment

sw005320 Oct 31, 2023

Choose a reason for hiding this comment

sw005320 Oct 31, 2023

Choose a reason for hiding this comment

sw005320 commented Oct 31, 2023

kohei0209 left a comment

Choose a reason for hiding this comment

sw005320 commented Nov 1, 2023

codecov bot commented Oct 18, 2023 •

edited

Loading

kohei0209 left a comment •

edited

Loading