Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric aggregation testing #3517

Merged

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented Sep 16, 2020

What does this PR do?

With PR #3245 we changed the way we are doing ddp sync and also introduced aggregation over multiple batches. This PR will add test for all metrics such that we are sure that they work in the following cases:

  • Aggregation over multiple devices (aka running in ddp mode)
  • Aggregation over multiple equal size batches
  • Aggregation over multiple unequal size batches (important because a lot of assumptions that the final metric value can be computed as the mean over individual computed values break in this example)
  • Aggregation over multiple devices and multiple batches
  • Model integration

Should fix #3230

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Sep 16, 2020

Hello @SkafteNicki! Thanks for updating this PR.

Line 282:121: E501 line too long (129 > 120 characters)

Comment last updated at 2020-10-01 11:32:20 UTC

@codecov
Copy link

codecov bot commented Sep 17, 2020

Codecov Report

Merging #3517 into master will increase coverage by 3%.
The diff coverage is 95%.

@@           Coverage Diff           @@
##           master   #3517    +/-   ##
=======================================
+ Coverage      87%     90%    +3%     
=======================================
  Files         110     110            
  Lines        8858    8818    -40     
=======================================
+ Hits         7731    7921   +190     
+ Misses       1127     897   -230     

@SkafteNicki
Copy link
Member Author

@Borda , @awaelchli , @justusschock would like your guys input on this.
The testing of metrics in ddp mode has long been overdue and after doing this to 7 of them clearly shows that they are broken when we start to aggregate (both multi device and multi batch).
It requires that most metrics divide their computations into pre-ddp and post-ddp (as I hinted at in #2528 and borda pointed to in #3510).
Lukily, we do have the interface for this: pre-ddp goes into forward method and post-ddp goes in the static method compute.
As stated I have done this to 7 of them and would like your guys input before continuing:

  1. Is the test setup too complicated? Espicially should testing be split into their respective files (classification, regression ect) or one large as in this PR?

  2. How to keep code duplication low with the functional backend? Right now I have basically implemented a return_state argument to the functional version of each metric which returns the metric-state right before ddp sync so only minimal computations needs to happen after ddp sync.

  3. How to split the work? I can do many of them but I guess one big PR will be too complicated to review.

@ananyahjha93
Copy link
Contributor

@SkafteNicki hey, I am am working on this as well, and I think there are some metrics which wouldn't work in that order. I think it is better to change the order to pre_ddp (input_convert) -> ddp_reduce/ddp_gather -> post_ddp (forward/output_convert). i.e. we need to gather statistics before computing the metrics themselves, rather than computing the metrics on individual GPUs and figuring out how metrics from different batches are combined. It won't just be mean/weighted mean or something along those lines for any specific metric. So, I have been changing this itself for metrics but was breaking a lot of things yesterday.

Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, LGTM on a high level. I'm not a metrics guy, so no comment on correctness :)

pytorch_lightning/metrics/functional/classification.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/functional/regression.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/regression.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/regression.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/regression.py Outdated Show resolved Hide resolved
pytorch_lightning/metrics/regression.py Outdated Show resolved Hide resolved
tests/metrics/test_metrics.py Show resolved Hide resolved
@mergify mergify bot requested a review from a team September 28, 2020 12:10
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left some comments.

pytorch_lightning/metrics/classification.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team September 28, 2020 13:09
@mergify
Copy link
Contributor

mergify bot commented Sep 29, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team September 29, 2020 14:59
@mergify mergify bot requested a review from a team September 29, 2020 15:12
pytorch_lightning/metrics/metric.py Show resolved Hide resolved
pytorch_lightning/metrics/metric.py Show resolved Hide resolved
pytorch_lightning/metrics/self_supervised.py Outdated Show resolved Hide resolved
tests/base/model_test_epoch_ends.py Outdated Show resolved Hide resolved
tests/metrics/test_aggregation.py Outdated Show resolved Hide resolved
tests/metrics/test_aggregation.py Show resolved Hide resolved
tests/metrics/test_aggregation.py Outdated Show resolved Hide resolved
@mergify mergify bot requested a review from a team September 29, 2020 15:18
@mergify
Copy link
Contributor

mergify bot commented Sep 30, 2020

This pull request is now in conflict... :(

@mergify
Copy link
Contributor

mergify bot commented Oct 1, 2020

This pull request is now in conflict... :(

@SkafteNicki SkafteNicki merged commit fe29028 into Lightning-AI:master Oct 1, 2020
@Borda Borda added this to the 0.10.0 milestone Oct 7, 2020
@SkafteNicki SkafteNicki deleted the metrics/aggregation_testing branch December 30, 2020 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants