Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interleaved pipeline and fix naive amp #80

Merged
merged 1 commit into from
Dec 20, 2021
Merged

Conversation

ver217
Copy link
Member

@ver217 ver217 commented Dec 17, 2021

Overview

  1. We add interleaved pipeline schedule.
  2. Update pipeline model initializer.
  3. Fix Naive AMP

Interleaved pipeline schedule and pipeline model initializer

We kept the original PipelineModelInitializer, but renamed it to build_pipeline_model_from_cfg. We add a new function to build the pipeline parallel model: build_pipeline_model. User must pass a torch.nn.Sequential to this function.

Usage:

from colossalai.engine.schedule import InterleavedPipelineSchedule
from colossalai.builder import build_pipeline_model_from_cfg, build_pipeline_model

# from config
model = build_pipeline_model_from_cfg(model_cfg, num_model_chunks)

# from torch.nn.Sequential
model = build_pipeline_model(sequential_model, num_model_chunks)

schedule = InterleavedPipelineSchedule(num_microbatches, num_model_chunks)

build_pipeline_model and build_pipeline_model_from_cfg can also receive a key word argument verbose. The detail of layers in each rank will be printed if setting verbose to True.

Fix Naive AMP

The old Naive AMP didn't apply all-reduce over pipeline parallel process group when computing infinity values, which leads to a hang. I fixed this bug, so that both pipeline parallelism and interleaved pipeline parallelism with Naive AMP won't get stuck again. Note that the pipeline parallelism only support Naive AMP now.

@ver217 ver217 added bug Something isn't working enhancement New feature or request labels Dec 17, 2021
@ver217 ver217 requested a review from FrankLeeeee December 17, 2021 05:34
@ver217 ver217 requested review from FrankLeeeee and removed request for FrankLeeeee December 20, 2021 06:50
@ver217 ver217 requested review from FrankLeeeee and removed request for FrankLeeeee December 20, 2021 07:00
@FrankLeeeee FrankLeeeee merged commit 8f02a88 into main Dec 20, 2021
@ver217 ver217 deleted the feature/pipeline branch December 21, 2021 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants