-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to use proper activation function in contextual block conformer e… #5467
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5467 +/- ##
==========================================
+ Coverage 77.14% 77.16% +0.01%
==========================================
Files 684 684
Lines 62713 62713
==========================================
+ Hits 48383 48391 +8
+ Misses 14330 14322 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@espnetUser and @D-Keqi, |
OK, I expect I can test the performance later this week (the GPU server is now under maintenance). If @espnetUser already has some results it would be great to let us know. |
@D-Keqi: I don't have results yet and am currently tied up with other work. I can post results later when I got time to run another training w/wo this fix for the activation function. |
OK, I expect I might be able to give it a go next week later due to the GPU service maintenance. |
I gave it a go on Aishell-1 and found that ReLU actually works better than Swish for the linear PositionwiseFeedforward layer in my case. So we might want to leave the current version unchanged @sw005320. If @espnetUser has other results you'd like to share that's very welcome too. |
…ncoder
What?
#5453: Added missing activation parameter to linear PositionwiseFeedforward layers arguments for contextual block Conformer encoder
Why?
For Conformer encoder linear PositionwiseFeedforward layers typically use Swish rather than ReLU (default) activation function
See also
Original Conformer paper: https://arxiv.org/abs/2005.08100