Closed
Description
When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.
Expected behavior is that the dataset is divided appropriately across the gpus.
I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.
The problem appears to be in auto_add_sampler()
in data_loading.py. It does not create a DistributedSampler
for validation or test datasets.
Activity
sneiman commentedon Mar 16, 2020
Latest pull - 1 hour ago, no longer this behavior. Closing.
sneiman commentedon Mar 17, 2020
Sorry - this issue still exists in some configurations. My proposed fix is not the total picture. Still investigating - will provide reproducible example.
sneiman commentedon Mar 17, 2020
Testing underway. Will make PR tomorrow.
sneiman commentedon Mar 17, 2020
Dont want to clutter up PR world if no one is interested in this. Let me know ...
Borda commentedon Mar 18, 2020
that sounds a good contribution to me... mind send a PR?
Any suggestion @PyTorchLightning/core-contributors?
in a technical note when you refer some master state pls use coit hash as there can be multiple commits each day...
sneiman commentedon Mar 18, 2020
will do on both pr, and hash ref
2 remaining items