-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save test predictions on multiple GPUs #2926
Conversation
Hello @nateraw! Thanks for updating this PR.
Comment last updated at 2020-08-14 20:42:10 UTC |
Codecov Report
@@ Coverage Diff @@
## master #2926 +/- ##
=======================================
+ Coverage 86% 88% +2%
=======================================
Files 81 81
Lines 7554 7836 +282
=======================================
+ Hits 6476 6907 +431
+ Misses 1078 929 -149 |
|
||
# Add step predictions to prediction collection to write later | ||
do_write_predictions = is_result_obj and test_mode | ||
if do_write_predictions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- this should just stay inside the prediction object
- we already track prediction objects. No need to do anything explicit.
@@ -388,6 +395,9 @@ def _evaluate( | |||
# log callback metrics | |||
self.__update_callback_metrics(eval_results, using_eval_result) | |||
|
|||
# Write predictions to disk if they're available. | |||
predictions.to_disk() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just be an internal function of the prediction object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can’t wait till the end of epoch to write predictions bc we will accumulate too much memory.
the write needs to happen at every batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we want to deal with writing to the cache file (w/ torch.save
) if we want to append to it?
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Save test predictions on multiple GPUs
What does this PR do?
This PR lets you save
test_step
predictions across multiple GPUs via a new functionEvalResult.write(name, values, filename='predictions.txt')
.Here's what it looks like:
.write()
on yourEvalResult
objects, nothing should happen.predictions.txt
, you'll end up with:predictions_rank_0.txt
--> 5000 predictionspredictions_rank_1.txt
--> 5000 predictionsIts basically going to take whatever you send to
.write()
andtorch.cat
/list.extend
it with existing data with matching filename/name keys. In the background, the dictionary we're creating looks like this right now:Fixes # (issue)
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃