Skip to content

Model Parallelism and Big ModelsΒ #8771

Open
@alexorona

Description

πŸš€ Feature request

This is a discussion issue for training/fine-tuning very large transformer models. Recently, model parallelism was added for gpt2 and t5. The current implementation is for PyTorch only and requires manually modifying the model classes for each model. Possible routes (thanks to @stas00 for identifying these):

  • fairscale to avoid individual model implementation
  • deepspeed to possibly enable even larger models to be trained

Metadata

Assignees

Labels

Model ParallelModel Parallelilsm ImplementationsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions