zero grad params on initialization #484

willwhitney · 2015-11-19T00:38:23Z

th> lin = nn.Linear(2,2)
th> p1, gp1 = lin:getParameters()
th> p1
 0.4611
-0.6737
-0.6769
 0.3312
-0.3065
-0.0952
[torch.DoubleTensor of size 6]

Is there a reason grad params don't start zeroed when a module is initialized? This seems super dangerous, and since initialization only happens once, it's not like it's a big performance hit to zero them.

th> gp1
-2.6816e+154
-2.6816e+154
 2.9644e-323
 2.7813e-309
-2.6816e+154
-2.6816e+154
[torch.DoubleTensor of size 6]

The text was updated successfully, but these errors were encountered:

soumith · 2015-11-19T01:17:28Z

the training loop is usually run as:

model:zeroGradParameters()
criterion:forward(model:forward(...), target)
model:backward(...)
optimization

after every mini-batch, you need to zero the gradient buffers for correctness anyways. Initialization with zeros will likely hide bugs induced by forgetting to zero the gradBuffers every iteration...

willwhitney · 2015-11-19T01:56:59Z

Yup, I get that this is the standard form. But intuitively, you'd expect this one would work just as well:

criterion:forward(model:forward(...), target)
model:backward(...)
optimization
model:zeroGradParameters()

This probably isn't that big a deal either way (I came across it randomly, not as a bug), but it seems like since all the other fields get initialized for you, this one would too.

soumith · 2015-12-19T08:51:00Z

this has come up in the past, several times. Maybe we should initialize gradWeight / gradBias with nans.

Kaixhin · 2015-12-29T09:02:25Z

I had assumed that zeros was the case and just so happened to write an optimisation loop the latter way around, so +1 for initialising with NaNs (by the reasoning you gave above).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zero grad params on initialization #484

zero grad params on initialization #484

willwhitney commented Nov 19, 2015

soumith commented Nov 19, 2015

willwhitney commented Nov 19, 2015

soumith commented Dec 19, 2015

Kaixhin commented Dec 29, 2015

zero grad params on initialization #484

zero grad params on initialization #484

Comments

willwhitney commented Nov 19, 2015

soumith commented Nov 19, 2015

willwhitney commented Nov 19, 2015

soumith commented Dec 19, 2015

Kaixhin commented Dec 29, 2015