-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero grad params on initialization #484
Comments
the training loop is usually run as: model:zeroGradParameters() after every mini-batch, you need to zero the gradient buffers for correctness anyways. Initialization with zeros will likely hide bugs induced by forgetting to zero the gradBuffers every iteration... |
Yup, I get that this is the standard form. But intuitively, you'd expect this one would work just as well: criterion:forward(model:forward(...), target) This probably isn't that big a deal either way (I came across it randomly, not as a bug), but it seems like since all the other fields get initialized for you, this one would too. |
this has come up in the past, several times. Maybe we should initialize gradWeight / gradBias with nans. |
I had assumed that zeros was the case and just so happened to write an optimisation loop the latter way around, so +1 for initialising with NaNs (by the reasoning you gave above). |
Is there a reason grad params don't start zeroed when a module is initialized? This seems super dangerous, and since initialization only happens once, it's not like it's a big performance hit to zero them.
The text was updated successfully, but these errors were encountered: