-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add USES model for speech enhancement in diverse conditions #5482
Conversation
This pull request is now in conflict :( |
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #5482 +/- ##
==========================================
+ Coverage 75.40% 75.43% +0.02%
==========================================
Files 709 711 +2
Lines 65361 65757 +396
==========================================
+ Hits 49288 49606 +318
- Misses 16073 16151 +78
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
for more information, see https://pre-commit.ci
I'll review it. |
…hannelwiseLayerNorm
for more information, see https://pre-commit.ci
@kohei0209, please review this PR. |
Sure, I'll review this PR. |
@kohei0209 and @LiChenda, can you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Emrys365, I left several comments. Could you check them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Emrys365 , I updated some comments for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like some new code blocks are not covered by the test. Could you also update the unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Emrys365 ! Now, it looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -220,6 +220,37 @@ def __call__( | |||
speech_mix = to_device(speech_mix, device=self.device) | |||
lengths = to_device(lengths, device=self.device) | |||
|
|||
################################### | |||
# Normalize the signal variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These normalizations look good, but be careful when we use online (streaming) manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note.
Please fix the CI error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Emrys365 Looks good to me, thank you for reflecting the comments!
Thanks! |
What?
This PR adds a new speech enhancement model USES, which is capable of handling diverse input conditions with a single model:
Along with the USES model implementation, I also update the related functions that are required to train this new model, including:
default_fs
is added toChunkIterFactory
to allow adaptive adjustment ofchunk_length
according to the sampling frequency of the batch. The related argument is also added to espnet2/tasks/abs_task.py.EnhPreprocessor
. Also improve some existing functions to more flexibility.reset_config
andreconfig_for_fs
to allow adaptive adjustment ofn_fft
,win_length
, andhop_length
according to the sampling frequency of the input signal.MultiResL1SpecLoss
to support different reduction modes and variance normalization before loss calculation.activation
, which will be used in the new model.espnet2/enh/layers/uses.py
andespnet2/enh/separator/uses_separator.py
: Implementation of the new model.espnet2/enh/espnet_model.py
.Why?
This new SE model and the related function updates can increase the flexibility of the current SE framework.
See also
Pre-trained model is available at HuggingFace.