Skip to content

multi-gpu ddp calls validation and testing loops too many times #1161

Closed
@sneiman

Description

When using ddp with multiple gpus, each validation and test loop is called with the entire validation dataset for each gpu.

Expected behavior is that the dataset is divided appropriately across the gpus.

I am using current master (cloned Mar 14), Ubuntu 19.10, Cuda 10.1, python 3.7.5, pytorch 1.4, venv environment.

The problem appears to be in auto_add_sampler() in data_loading.py. It does not create a DistributedSampler for validation or test datasets.

Activity

sneiman

sneiman commented on Mar 16, 2020

@sneiman
ContributorAuthor

Latest pull - 1 hour ago, no longer this behavior. Closing.

sneiman

sneiman commented on Mar 17, 2020

@sneiman
ContributorAuthor

Sorry - this issue still exists in some configurations. My proposed fix is not the total picture. Still investigating - will provide reproducible example.

sneiman

sneiman commented on Mar 17, 2020

@sneiman
ContributorAuthor

Testing underway. Will make PR tomorrow.

sneiman

sneiman commented on Mar 17, 2020

@sneiman
ContributorAuthor

Dont want to clutter up PR world if no one is interested in this. Let me know ...

changed the title multi-gpu ddp calls validation loop too many times multi-gpu ddp calls validation and testing loops too many times on Mar 17, 2020
added this to the 0.7.2 milestone on Mar 18, 2020
Borda

Borda commented on Mar 18, 2020

@Borda
Member

that sounds a good contribution to me... mind send a PR?
Any suggestion @PyTorchLightning/core-contributors?
in a technical note when you refer some master state pls use coit hash as there can be multiple commits each day...

sneiman

sneiman commented on Mar 18, 2020

@sneiman
ContributorAuthor

will do on both pr, and hash ref

2 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      multi-gpu ddp calls validation and testing loops too many times · Issue #1161 · Lightning-AI/pytorch-lightning