Fixing sampler logic for ddp with iterable dataset #1734

twangnyc · 2020-05-05T04:23:01Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes training with iterable dataset with ddp.
Currently if using iterable dataset, without setting sampler in ddp, self.train_dataloader.sampler will be _InfiniteConstantSampler by default in Pytorch. This sampler doesn't have attribute 'set_epoch'. This will create the error as following:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/root/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 372, in ddp_train
    self.run_pretrain_routine(model)
  File "/root/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 914, in run_pretrain_routine
    self.train()
  File "/root/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 332, in train
    self.train_dataloader.sampler.set_epoch(epoch)
AttributeError: '_InfiniteConstantSampler' object has no attribute 'set_epoch'

The bug is caused by the logic sequence between or and and in the following code snippet:

if self.use_ddp or self.use_horovod \
    and hasattr(self.train_dataloader.sampler, 'set_epoch'):

True or False and False = True
(True or False) and False = False

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-05-05T04:36:58Z

Codecov Report

Merging #1734 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1734   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          69      69           
  Lines        4151    4151           
======================================
  Hits         3661    3661           
  Misses        490     490

ethanwharris

LGTM

Fixing logic

25d72ef

mergify bot requested a review from a team May 5, 2020 04:23

Borda added the bug Something isn't working label May 5, 2020

Borda approved these changes May 5, 2020

View reviewed changes

Borda added this to the 0.7.6 milestone May 5, 2020

Borda added the ready PRs ready to be merged label May 5, 2020

mergify bot requested a review from a team May 5, 2020 07:24

ethanwharris approved these changes May 5, 2020

View reviewed changes

mergify bot requested a review from a team May 5, 2020 11:01

williamFalcon merged commit d6a0375 into Lightning-AI:master May 5, 2020

twangnyc deleted the fix_sampler_attr_ddp branch May 11, 2020 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing sampler logic for ddp with iterable dataset #1734

Fixing sampler logic for ddp with iterable dataset #1734

twangnyc commented May 5, 2020 •

edited

Loading

codecov bot commented May 5, 2020

ethanwharris left a comment

Fixing sampler logic for ddp with iterable dataset #1734

Fixing sampler logic for ddp with iterable dataset #1734

Conversation

twangnyc commented May 5, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

codecov bot commented May 5, 2020

Codecov Report

ethanwharris left a comment

Choose a reason for hiding this comment

twangnyc commented May 5, 2020 •

edited

Loading