Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Refactor distributed_launcher logic and simplify fb build #185

Closed
wants to merge 2 commits into from

Conversation

prigoyal
Copy link
Contributor

Summary: we are bringing in submitit to VISSL (thanks to QuentinDuval ) and this required moving the launch_distributed to the core vissl library. Some refactorings needed to make the fblearner workflow adapt accordingly.

Differential Revision: D26340992

QuentinDuval and others added 2 commits February 9, 2021 16:05
…he right amount of resources (facebookresearch#184)

Summary:
Improve the submission of distributed training on SLURM (rebase of previous PR facebookresearch#144):

- use the configuration of the experiment to deduce the number of nodes and GPUs to allocate on SLURM: the user does not have to specify it manually, avoiding potential mistakes

- move SLURM config from bash to standard VISSL YAML config, allowing SLURM options (like the options of VISSL) to be configured with hydra overrides

- use Python submitit library instead of bash to start SLURM jobs: a prerequisite to deal with the points above, moving to Python is what allows to read the hydra VISSL configuration to start the SLURM jobs

Pull Request resolved: facebookresearch#184

Differential Revision: D26353975

Pulled By: QuentinDuval

fbshipit-source-id: 685c0346e437a8a1fd4e855086ad6639205810ea
Summary: we are bringing in submitit to VISSL (thanks to QuentinDuval ) and this required moving the `launch_distributed` to the core `vissl` library. Some refactorings needed to make the fblearner workflow adapt accordingly.

Differential Revision: D26340992

fbshipit-source-id: 077594ff2698c6d07a1ca3c2c75b8a87ca74d1a5
@prigoyal prigoyal closed this Feb 10, 2021
@prigoyal prigoyal deleted the export-D26340992 branch February 10, 2021 00:06
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D26340992

@prigoyal prigoyal restored the export-D26340992 branch February 10, 2021 00:06
@prigoyal prigoyal reopened this Feb 10, 2021
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D26340992

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D26340992

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e94858c.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants