Add limit_all_gathers=True (#346)

Summary: Adding rate limiter support. Per ankitade edward-io, this can provide up to 2x speedup for 10B parameter model. Pull Request resolved: #346 Reviewed By: edward-io Differential Revision: D40050254 Pulled By: rohan-varma fbshipit-source-id: f1b3be78be7d1a8c580a9612fc813a62efeed540
facebookresearch · Oct 4, 2022 · 786f31b · 786f31b
1 parent 905cfbd
commit 786f31b
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/examples/flava/native/train.py b/examples/flava/native/train.py
@@ -196,6 +196,7 @@ def create_model(self) -> torch.nn.Module:
                         FLAVATransformerWithoutEmbeddings,
                     },
                 ),
+                limit_all_gathers=True,
             )
 
             print0(f"after FSDP {torch.cuda.memory_allocated()/1024**3:.3} GB")