Tags: zhangbo9674/pytorch
Tags
Document factory_kwargs in nn.Quantize + remove Attributes section (p… …ytorch#59025) (pytorch#59045) Summary: The `factory_kwargs` kwarg was previously undocumented in `nn.Quantize`. Further, the `Attributes` section of the docs was improperly filled in, resulting in bad formatting. This section doesn't apply since `nn.Quantize` doesn't have parameters, so it has been removed. Pull Request resolved: pytorch#59025 Reviewed By: anjali411 Differential Revision: D28723889 Pulled By: jbschlosser fbshipit-source-id: ba86429f66d511ac35042ebd9c6cc3da7b6b5805 Co-authored-by: Joel Schlosser <jbschlosser@fb.com>
[release/1.9] Fix issues regarding binary_chekcout (pytorch#58495) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Perform appropriate CUDA stream synchronization in distributed autogr… …ad. (pytorch#53929) (pytorch#54358) Summary: Pull Request resolved: pytorch#53929 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 124055277 (Note: this ignores all push blocking failures!) Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: walterddr, wanchaol Differential Revision: D27025307 fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414 Co-authored-by: Pritam Damania <pritam.damania@fb.com>
Perform appropriate CUDA stream synchronization in distributed autogr… …ad. (pytorch#53929) (pytorch#54358) Summary: Pull Request resolved: pytorch#53929 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 124055277 (Note: this ignores all push blocking failures!) Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: walterddr, wanchaol Differential Revision: D27025307 fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414 Co-authored-by: Pritam Damania <pritam.damania@fb.com>
third_party: Update kineto to fix libtorch builds (pytorch#54205) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
third_party: Update kineto to fix libtorch builds (pytorch#54205) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Fix hipify_python (pytorch#52756) Co-authored-by: rraminen <rraminen@amd.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
Fix hipify_python (pytorch#52756) Co-authored-by: rraminen <rraminen@amd.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>
[1.8] Fix onnx mixed precision export for layernorm & fuseLogSoftmaxN… …llLoss (pytorch#52510) Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
PreviousNext