This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Closed
Description
This line https://github.com/facebookresearch/barlowtwins/blob/main/main.py#L208 use torch.distributed.all_reduce
to sum the correlation matrices across all gpus. However as I know this op is not dedicated for forward computation where backward computation would run later. Instead, to apply "correctly differentiable" distributed all reduce, the official PyTorch document recommends using torch.distributed.nn.*
: https://pytorch.org/docs/stable/distributed.html#autograd-enabled-communication-primitives
Metadata
Assignees
Labels
No labels