Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove check_checkpoint_callback #7724

Merged
merged 37 commits into from
Jul 19, 2021
Merged

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented May 26, 2021

What does this PR do?

Changes:

  • Replace training_loop.on_train_end call to check_checkpoint_callback with the on_train_end hook implementation in ModelCheckpoint
  • Always rely on the ModelCheckpoint hooks to save.

This resolves a bug where an extra checkpoint was saved at the end of training if the val_check_interval did not align with the training batches. In that case, a checkpoint was always saved as if save_last was set to True even if it was not. This change is reflected in the tests.

Fixes #6672

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@pep8speaks
Copy link

pep8speaks commented May 26, 2021

Hello @carmocca! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-07-19 10:52:34 UTC

@carmocca carmocca marked this pull request as draft May 26, 2021 13:27
@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

Merging #7724 (6ad0a5e) into master (999ef5c) will decrease coverage by 5%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #7724    +/-   ##
=======================================
- Coverage      93%     88%    -5%     
=======================================
  Files         217     217            
  Lines       14227   14218     -9     
=======================================
- Hits        13201   12530   -671     
- Misses       1026    1688   +662     

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mergify mergify bot removed the has conflicts label Jul 14, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@awaelchli awaelchli added checkpointing Related to checkpointing ready PRs ready to be merged labels Jul 15, 2021
@awaelchli awaelchli enabled auto-merge (squash) July 15, 2021 17:56
@mergify mergify bot removed the has conflicts label Jul 15, 2021
@carmocca carmocca disabled auto-merge July 16, 2021 01:02
@carmocca carmocca force-pushed the refactor/remove-check-ckpt-callback branch from 84b05d1 to b709a8f Compare July 16, 2021 01:30
@mergify mergify bot removed the has conflicts label Jul 19, 2021
@carmocca carmocca enabled auto-merge (squash) July 19, 2021 10:53
@carmocca carmocca merged commit 710df39 into master Jul 19, 2021
@carmocca carmocca deleted the refactor/remove-check-ckpt-callback branch July 19, 2021 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpointing Related to checkpointing priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Training Loop Checkpoint Consolidation
8 participants