Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-init the residual branch in resnet #498

Merged
merged 3 commits into from
Dec 6, 2018
Merged

Zero-init the residual branch in resnet #498

merged 3 commits into from
Dec 6, 2018

Conversation

ppwwyyxx
Copy link
Contributor

No description provided.

@ppwwyyxx
Copy link
Contributor Author

@fmassa Could you take a look?

@fmassa
Copy link
Member

fmassa commented Jun 19, 2018

Hey,

I think the only way I'd accept such modifications would be if we pass extra arguments into the constructor to modify the way we initialize the weights.

I don't want to break backwards-compatibility as lots of people are using this model already.

@ppwwyyxx
Copy link
Contributor Author

This only makes the model better without any user-side changes. But sure I can make it optional.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 19, 2018

Just wondering @ppwwyyxx could you benchmark the improvement with pytorch (or using tensorpack) ?

Edit: As I see you implemented this here in tensorpack

@ppwwyyxx
Copy link
Contributor Author

ppwwyyxx commented Jun 19, 2018

I have. On ResNet50 I'm able to get 23.6~23.7 error with pytorch. Without this change it's about 24.1

@ppwwyyxx
Copy link
Contributor Author

ppwwyyxx commented Jul 8, 2018

@fmassa I added it as an argument

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa fmassa merged commit 4cc6f45 into pytorch:master Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants