Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix accumulate_grad_batches for last batch #2853

Merged
merged 18 commits into from
Aug 15, 2020
Merged

Fix accumulate_grad_batches for last batch #2853

merged 18 commits into from
Aug 15, 2020

Conversation

ydcjeff
Copy link
Contributor

@ydcjeff ydcjeff commented Aug 6, 2020

What does this PR do?

Fixes #1549

EDIT:

Issue to reproduce – Colab

https://colab.research.google.com/drive/1iMMKPMeghhfWD092lT3l9XKClY2YJYNQ?usp=sharing

Fixed issue to reproduce – Colab

https://colab.research.google.com/drive/1udmCrqYfJz2t8pEDi5pAIJKcyH_jHLLF?usp=sharing

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Aug 6, 2020

Hello @ydcjeff! Thanks for updating this PR.

Line 594:17: W503 line break before binary operator
Line 742:17: W503 line break before binary operator
Line 884:25: W503 line break before binary operator

Comment last updated at 2020-08-15 13:46:13 UTC

@mergify mergify bot requested a review from a team August 6, 2020 17:53
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Aug 6, 2020

Should I write new tests?

@Borda Borda added this to the 0.9.0 milestone Aug 6, 2020
@Borda
Copy link
Member

Borda commented Aug 6, 2020

Should I write new tests?

yes, please, add a new test for this case...

@mergify mergify bot requested a review from a team August 6, 2020 19:08
@Borda Borda added the bug Something isn't working label Aug 6, 2020
@ydcjeff ydcjeff changed the title Fix accumulate_grad_batches for last batch [WIP] Fix accumulate_grad_batches for last batch Aug 7, 2020
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Aug 7, 2020

It is ready to review.

@ydcjeff ydcjeff changed the title [WIP] Fix accumulate_grad_batches for last batch Fix accumulate_grad_batches for last batch Aug 7, 2020
tests/trainer/test_trainer.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/training_loop.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team August 7, 2020 19:41
@mergify mergify bot requested a review from a team August 7, 2020 21:54
@mergify mergify bot requested a review from a team August 8, 2020 13:57
@mergify
Copy link
Contributor

mergify bot commented Aug 8, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team August 15, 2020 10:06
@mergify
Copy link
Contributor

mergify bot commented Aug 15, 2020

This pull request is now in conflict... :(

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✌️. Not sure about circleci BuildError

@mergify mergify bot requested a review from a team August 15, 2020 13:02
@awaelchli
Copy link
Contributor

awaelchli commented Aug 15, 2020

Try to merge master to get rid of the build error

@rohitgr7
Copy link
Contributor

not sure if I merged it correctly but still the same BuildError.

@williamFalcon
Copy link
Contributor

ummm... ok, we seem to have broken this haha.

Can we go back to the commit without rebasing master?
Then rebase master.

@williamFalcon
Copy link
Contributor

ok... i need to run but. Maybe the easiest thing is to start from master again and apply your changes using master as the base?

i don't know why this is trying to pull changes from weeks ago

@awaelchli
Copy link
Contributor

awaelchli commented Aug 15, 2020

@rohitgr7 can force push the branch with the local changes, then it should be back to how it was.
I recommend merging master into the branch, otherwise you lose changes if you make a mistake in rebasing.

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good now, this build config error must come from master.

@mergify mergify bot requested a review from a team August 15, 2020 14:01
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Aug 15, 2020

not sure what just happened, but thanks for fixing @rohitgr7

@williamFalcon
Copy link
Contributor

awesome!

  1. does the test fail on master?
  2. fix pep8?

@rohitgr7
Copy link
Contributor

pep8 is not a problem here I think https://www.flake8rules.com/rules/W503.html

@Borda
Copy link
Member

Borda commented Aug 15, 2020

@ydcjeff seems you do not run CirclCI tests on our side which is the reason why all TPU test are missing

@ydcjeff
Copy link
Contributor Author

ydcjeff commented Aug 15, 2020

@Borda I fetched from upstream and worked on separate branch, not sure why TPU tests are missing.
Even there is circleci config

@williamFalcon
Copy link
Contributor

ok... i tested this on master and the tests do fail. let's merge so we can get rid of the circle ci thing and run tpu tests

@williamFalcon williamFalcon merged commit 73ebd10 into Lightning-AI:master Aug 15, 2020
@ydcjeff ydcjeff deleted the accumulate-grad-batches-1549 branch August 16, 2020 08:39
@awaelchli awaelchli mentioned this pull request Aug 16, 2020
7 tasks
ameliatqy pushed a commit to ameliatqy/pytorch-lightning that referenced this pull request Aug 17, 2020
* first attempt

* update changelog

* fix pep8 and tests

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* added new tests

* fixed tests

* Apply suggestions from code review

* used num_training_batches

* fixed pep8

* fixed with is_last_batch suggested by @awaelchli

* fixed with num_training_batches

* fixed with num_training_batches

* cleanup

* fix test and update docs

* fixed for alignment, update docs

* minor changes

* update doc

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
@Borda Borda modified the milestones: 0.9.0, 1.0.0 Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unwanted accumulate_grad_batches behavior
6 participants