Is torch.distributed.all_reduce working as expected?

This line https://github.com/facebookresearch/barlowtwins/blob/main/main.py#L208 use `torch.distributed.all_reduce` to sum the correlation matrices across all gpus. However as I know this op is not dedicated for forward computation where backward computation would run later. Instead, to apply "correctly differentiable" distributed all reduce, the official PyTorch document recommends using `torch.distributed.nn.*`: https://pytorch.org/docs/stable/distributed.html#autograd-enabled-communication-primitives

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is torch.distributed.all_reduce working as expected? #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development