-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make training_epoch_end behave like validation_epoch_end #1357
Conversation
Hello @jbschiratti! Thanks for updating this PR.
Comment last updated at 2020-04-03 11:15:16 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! thx 🚀 pls add a note to changelog...
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Unfortunately there's a potential memory leak here. The process_output
function doesn't detach the tensors, so when I run your code, the each tensor in the OrderedDict
given to on_training_epoch_end
still has an associated grad_fn
. To resolve this we need a way to detach
any tensors in the output, perhaps based on process_output
.
CircleCI failed on |
Thanks @Borda, @ethanwharris and @justusschock 👍🎉! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! One obvious thing that was missing is now finally here :))
I would like to point out some things that were overlooked.
@awaelchli thank you for your careful reading, may apply your suggestions in follow-up PR? |
yeah sure:) |
well, you would need to fork |
* Doc fixes from #1357 (awaelchli's comments) + changelog. * Fix indentation. * Add blank line to fix doc build?
…I#1357) * Make training_epoch_end behave like validation_epoch_end + minor fixes in docstrings. * Minor fixes (Borda's comments). * Detach tensors in batch_output (to avoid possible memory leak) + doc fix. Co-authored-by: Jean-Baptiste SCHIRATTI <jean-baptisteschiratti@MacBook-Pro-de-Jean-Baptiste.local>
* Doc fixes from Lightning-AI#1357 (awaelchli's comments) + changelog. * Fix indentation. * Add blank line to fix doc build?
* Doc fixes from Lightning-AI#1357 (awaelchli's comments) + changelog. * Fix indentation. * Add blank line to fix doc build?
What does this PR do?
This PR fixes #914.
The class
LightningModule
currently implements avalidation_epoch_end
method but notraining_epoch_end
method. TheCHANGELOG.md
file suggests that, from version 0.7.0, thetraining_end
(or nowtraining_step_end
) method is to be renamedtraining_epoch_end
. However, this is not satisfactory astraining_epoch_end
would still not behave likevalidation_epoch_end
.This PR ensures that both
training_epoch_end
andvalidation_epoch_end
have the same behavior (for instance: allow the user to log metrics only once at the end of each epoch).PR review
Anyone in the community is free to review the PR once the tests have passed 👍