Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.8.0] Getting signal for release/1.8 #51995

Closed
wants to merge 61 commits into from
Closed

Conversation

seemethere
Copy link
Member

No description provided.

Rong Rong and others added 2 commits February 9, 2021 10:16
Summary:
Fixes #50695

I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel.

Pull Request resolved: #51634

Reviewed By: zhangguanheng66

Differential Revision: D26225550

Pulled By: walterddr

fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a

Co-authored-by: mattip <matti.picus@gmail.com>
…el (#51864) (#51890)

Summary:
Test begins to fail after the driver udpate

See #51863

Pull Request resolved: #51864

Reviewed By: bertmaher

Differential Revision: D26304018

Pulled By: malfet

fbshipit-source-id: bb7ade2f28d8cf8f847159d4ce92391f0794c258

Co-authored-by: Nikita Shulga <nshulga@fb.com>
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 9, 2021

🔗 Helpful links

❌ 4 New Failures

As of commit 56b43f4 (more details on the Dr. CI page):

Expand to see more
  • 4/4 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See CircleCI build pytorch_windows_vs2019_py36_cuda11.1_build (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

ModuleNotFoundError: No module named 'yaml'
Building wheel torch-1.8.0a0+56b43f4
-- Building version 1.8.0a0+56b43f4
Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 368, in check_pydep
    importlib.import_module(importname)
  File "C:\Jenkins\Miniconda3\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 818, in <module>
    build_deps()
  File "C:\Users\circleci\project\setup.py", line 313, in build_deps
    check_pydep('yaml', 'pyyaml')
  File "C:\Users\circleci\project\setup.py", line 370, in check_pydep
    raise RuntimeError(missing_pydep.format(importname=importname, module=module))

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (2/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

ModuleNotFoundError: No module named 'yaml'
Building wheel torch-1.8.0a0+56b43f4
-- Building version 1.8.0a0+56b43f4
Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 368, in check_pydep
    importlib.import_module(importname)
  File "C:\Jenkins\Miniconda3\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 818, in <module>
    build_deps()
  File "C:\Users\circleci\project\setup.py", line 313, in build_deps
    check_pydep('yaml', 'pyyaml')
  File "C:\Users\circleci\project\setup.py", line 370, in check_pydep
    raise RuntimeError(missing_pydep.format(importname=importname, module=module))

See CircleCI build pytorch_windows_vs2019_py36_cpu_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

ModuleNotFoundError: No module named 'yaml'
Building wheel torch-1.8.0a0+56b43f4
-- Building version 1.8.0a0+56b43f4
Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 368, in check_pydep
    importlib.import_module(importname)
  File "C:\Jenkins\Miniconda3\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\circleci\project\setup.py", line 818, in <module>
    build_deps()
  File "C:\Users\circleci\project\setup.py", line 313, in build_deps
    check_pydep('yaml', 'pyyaml')
  File "C:\Users\circleci\project\setup.py", line 370, in check_pydep
    raise RuntimeError(missing_pydep.format(importname=importname, module=module))

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
CircleCI pytorch_macos_10_13_py3_test Test 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

James Reed and others added 4 commits February 9, 2021 15:44
Summary:
tries to fix doc_test

Pull Request resolved: #51825

Reviewed By: bertmaher

Differential Revision: D26295583

Pulled By: ngimel

fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80
(cherry picked from commit 6c0bf28)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: Natalia Gimelshein <ngimel@fb.com>
Summary:
Fixes issue: #49728
========
Ternary if operation fails in Torchscript when the condition variable is annotated as Final.

Tests:
=======
pytest -k test_ternary_static_if test/test_jit.py

Pull Request resolved: #51789

Reviewed By: gmagogsfm

Differential Revision: D26278969

Pulled By: nikithamalgifb

fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e

Co-authored-by: nikithamalgi <nikithamalgi@devvm146.prn0.facebook.com>
James Reed and others added 20 commits February 12, 2021 07:35
* Fix leaf modules in Transformer

[ghstack-poisoned]

* Fix tuple type annotations

[ghstack-poisoned]

* Generalize dict key check in `create-arg` (#51927)

Summary: Pull Request resolved: #51927

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26329655

Pulled By: jamesr66a

fbshipit-source-id: a15e7d9564551521af12a8fde1c7524856f0cbc2
Summary:
Pull Request resolved: #51878

`fake_quantize_per_tensor_affine_cachemask` and
`fake_quantize_per_channel_affine_cachemask` are implementation
details of `fake_quantize_per_tensor_affine` and
`fake_quantize_per_channel_affine`, removing the
Python bindings for them since there is no need to
expose them.

Test Plan:
```
python test/test_quantization.py TestFakeQuantize
```

Imported from OSS

Reviewed By: albanD, bugra

Differential Revision: D26314173

fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97
(cherry picked from commit 33afb5f)
Summary:
Move definition of copysign template and specialization for
bfloat16/half types before first use of copysign in that file

Add comment explaining why this is necessary

Fixes #51889

Pull Request resolved: #51900

Reviewed By: walterddr

Differential Revision: D26321741

Pulled By: malfet

fbshipit-source-id: 888858b11d9708fa140fe9c0570cc5a24599205b
Summary:
It frequently happens when PyTorch compiled with CUDA support is installed on machine that does not have NVIDIA GPUs.

Fixes #47038

Pull Request resolved: #51806

Reviewed By: ezyang

Differential Revision: D26285827

Pulled By: malfet

fbshipit-source-id: 9fd5e690d0135a2b219c1afa803fb69de9729f5e
…ation hooks (#52215)

Co-authored-by: wayi <wayi@devgpu238.prn2.facebook.com>
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
Summary:
Pull Request resolved: #50180

Resolves the regression in
#49819 by adding copy over background
stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off.

Test Plan: CI

Reviewed By: mrshenli, ngimel

Differential Revision: D25818170

fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75
Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>
torch.vmap is a prototype feature and should not be in the stable
binary. This PR:

- Removes the `torch.vmap` API
- Removes the documentation entry for torch.vmap
- Changes the vmap tests to use an internal API instead of torch.vmap.

Test Plan:
- Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait
for CI.
Summary:
Necessary to ensure correct link order, especially if libraries are
linked statically. Otherwise, one might run into:
```
/usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0'
/usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line
```

Pull Request resolved: #52243

Reviewed By: seemethere, ngimel

Differential Revision: D26437159

Pulled By: malfet

fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea
…#52406)

Summary:
Pull Request resolved: #52151

CUDA 11.2 might not be as performant as we thought so let's downgrade to
something we think is more performant.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D26408314

Pulled By: seemethere

fbshipit-source-id: e2446aa0115e2c2a79718b1fdfd9fccf2072822d
(cherry picked from commit a11650b)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Summary:
First part of #49886 to at least properly warn users of the current state

Pull Request resolved: #52311

Reviewed By: soulitzer

Differential Revision: D26495644

Pulled By: albanD

fbshipit-source-id: 72abdfe41cdbcc1ac739a536eb85d1aa4ba90897
Summary:
Pull Request resolved: #52389

Fixes: #49159

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26496319

Pulled By: gchanan

fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93
Summary:
Fixes #39502

This PR adds support for exporting **fake_quantize_per_channel_affine** to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR #39738.

`axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by onnx/onnx#2772.

[update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master.

The function is also tested offline with the following code
```python
import torch
from torch import quantization

from torchvision import models
qat_resnet18 = models.resnet18(pretrained=True).eval().cuda()

qat_resnet18.qconfig = quantization.QConfig(
    activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant)
quantization.prepare_qat(qat_resnet18, inplace=True)
qat_resnet18.apply(quantization.enable_observer)
qat_resnet18.apply(quantization.enable_fake_quant)

dummy_input = torch.randn(16, 3, 224, 224).cuda()
_ = qat_resnet18(dummy_input)
for module in qat_resnet18.modules():
    if isinstance(module, quantization.FakeQuantize):
        module.calculate_qparams()
qat_resnet18.apply(quantization.disable_observer)

qat_resnet18.cuda()

input_names = [ "actual_input_1" ]
output_names = [ "output1" ]

torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13)
```
It can generate the desired graph.

Pull Request resolved: #42835

Reviewed By: houseroad

Differential Revision: D26293823

Pulled By: SplitInfinity

fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea

Co-authored-by: Hao Wu <skyw@users.noreply.github.com>
Summary:
This is getting tested by #52441.

Adds new config for macos arm64 to our binary builds.
Now stores artifacts for mac builds.

Pull Request resolved: #52443

Reviewed By: walterddr

Differential Revision: D26517330

Pulled By: janeyx99

fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e
Co-authored-by: eellison <eellison@fb.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Co-authored-by: peterjc123 <peterghost86@gmail.com>
Co-authored-by: Jane Xu <janeyx@fb.com>
…llLoss (#52510)

Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
Summary:
Fixes #{issue number}

Pull Request resolved: #51847

Reviewed By: albanD

Differential Revision: D26405678

Pulled By: malfet

fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c

Co-authored-by: Joe Zhu <jozh@microsoft.com>
@seemethere seemethere added this to the 1.8.0 milestone Feb 22, 2021
James Reed and others added 23 commits March 9, 2021 17:23
* [FX] Cherrypick docs fixes

* Update code links to point to 1.8
Summary:
For enabling amp in torch/xla, see [this](pytorch/xla#2654).

Pull Request resolved: #48570

Reviewed By: ezyang

Differential Revision: D26120627

Pulled By: ailzhang

fbshipit-source-id: 32627b17c04bfdad128624676ea9bf6f117bc97d

Co-authored-by: Chengji Yao <yaochengji@hotmail.com>
…53675)

Summary:
For #47027.

Some progress has been made in #50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest.

This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface

It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright.

Pull Request resolved: #52339

Reviewed By: smessmer

Differential Revision: D26694909

Pulled By: malfet

fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1

Co-authored-by: Jake Bailey <5341706+jakebailey@users.noreply.github.com>
Summary:
Pull Request resolved: #53133

In light of some issues where users were having trouble installing CUDA
specific versions of pytorch we should no longer have special privileges
for CUDA 10.2.

Recently I added scripts/release/promote/prep_binary_for_pypi.sh (#53056) to make
it so that we could theoretically promote any wheel we publish to
download.pytorch.org to pypi

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D26759823

Pulled By: seemethere

fbshipit-source-id: 2d2b29e7fef0f48c23f3c853bdca6144b7c61f22
(cherry picked from commit b8546bd)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Summary:
Pull Request resolved: #53508

closes #53501

Differential Revision: D26885263

Test Plan: Imported from OSS

Reviewed By: H-Huang

Pulled By: mrshenli

fbshipit-source-id: dd0493e6f179d93b518af8f082399cacb1c7cba6
Summary:
Fixes #{51801}
LSTMCell example updated

Pull Request resolved: #51983

Reviewed By: agolynski

Differential Revision: D26467104

Pulled By: zou3519

fbshipit-source-id: 31c8bf89b21cd2f748b2cc28a74169082d81503c

Co-authored-by: CarlosJose126 <43588143+CarlosJose126@users.noreply.github.com>
Summary:
Mitigates #53267

Pull Request resolved: #53274

Reviewed By: zhangguanheng66, ailzhang

Differential Revision: D26819702

Pulled By: cpuhrsch

fbshipit-source-id: 5b9b30db6f8fc414aa9f3c841429bf99bc927763

Co-authored-by: cpuhrsch <cpuhrsch@devvm2783.frc0.facebook.com>
* Add sample validation for LKJCholesky.log_prob

* Fix distributions which don't properly honor validate_args=False

A number of derived distributions use base distributions in their
implementation.

We add what we hope is a comprehensive test whether all distributions
actually honor skipping validation of arguments in log_prob and then
fix the bugs we found. These bugs are particularly cumbersome in
PyTorch 1.8 and master when validate_args is turned on by default
In addition one might argue that validate_args is not performing
as well as it should when the default is not to validate but the
validation is turned on in instantiation.

Arguably, there is another set of bugs or at least inconsistencies
when validation of inputs does not prevent invalid indices in
sample validation (when with validation an IndexError is raised
in the test). We would encourage the implementors to be more
ambitious when validation is turned on and amend sample validation
to throw a ValueError for consistency.

* additional fixes to distributions

* address failing tests

Co-authored-by: neerajprad <neerajprad@devvm903.atn0.facebook.com>
Co-authored-by: Thomas Viehmann <tv.code@beamnet.de>
Summary:
Fixes #53368

Pull Request resolved: #53447

Reviewed By: albanD

Differential Revision: D26946284

Pulled By: jbschlosser

fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc

Co-authored-by: kshitij12345 <kshitijkalambarkar@gmail.com>
- Support transferring >2GB over CMA
- Avoid loading stub version of CUDA driver
- Don't use unsupported mmap option on older kernels
- Don't join non-existing thread if CMA is not viable

The last two manifested as uncaught exceptions (hence crashes) when initializing RPC. The first one caused same-machine RPC requests to fail.
…scriptMethods (#53519) (#53548) (#54005)

Summary:
Pull Request resolved: #53548

fixes issue faced in #53506

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D26922415

Pulled By: malfet

fbshipit-source-id: b61842827bb14cef8c7a7089b2426fa53e642c90

Co-authored-by: BowenBao <bowbao@microsoft.com>
…53328) (#53529) (#54007)

Summary:
Pull Request resolved: #53529

Supported for ONNX export after opset 10.
This is not exportable to opsets < 10 due to
1. onnx::IsInf is introduced in opset 10
2. onnx::Equal does not accept float tensor prior to opset 11

Test Plan: Imported from OSS

Reviewed By: pbelevich, malfet

Differential Revision: D26922418

Pulled By: SplitInfinity

fbshipit-source-id: 69bcba50520fa3d69db4bd4c2b9f88c00146fca7

Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary: Pull Request resolved: #52216

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26427506

Pulled By: ailzhang

fbshipit-source-id: ba4f2f66794cb2843926e5566eb4d25582f7fb2b

Co-authored-by: Ailing Zhang <ailzhang@fb.com>
#52893) (#53311) (#54019)

Summary:
Pull Request resolved: #53311

Fixes dict output & nested tuple.

Test Plan: Imported from OSS

Reviewed By: pbelevich, malfet

Differential Revision: D26922426

Pulled By: SplitInfinity

fbshipit-source-id: c2c6b71c8d978b990181e0b025626dbf6ef2199e
Summary:
To be in-sync with #53447

Pull Request resolved: #53931

Reviewed By: ngimel

Differential Revision: D27026616

Pulled By: malfet

fbshipit-source-id: 4c50b29fa296c90aeeeb1757bdaada92cbba33d4
Summary:
Updating Kineto to include bugfixes for 1.8.1

Test Plan: CI
…= 24 * n (#54015)

* Disabling dispatch to OneDNN for group convolutions when groups size is 24 * n

* Add condition to non-zero grps

Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>
Summary:
Follow-up of #53447

Reference: #53447 (comment)

Pull Request resolved: #53809

Reviewed By: bdhirsh

Differential Revision: D27049643

Pulled By: jbschlosser

fbshipit-source-id: 623a2a254783b86391dc2b0777b688506adb4c0e

Co-authored-by: kshitij12345 <kshitijkalambarkar@gmail.com>
Summary:
Since `char` is not guaranteed to be signed on all platforms (it is unsigned on ARM)
Fixes #52146

Pull Request resolved: #52616

Test Plan: Run ` python3 -c "import torch;a=torch.tensor([-1], dtype=torch.int8);print(a.tolist())"` on arm-linux system

Reviewed By: walterddr

Differential Revision: D26586678

Pulled By: malfet

fbshipit-source-id: 91972189b54f86add516ffb96d579acb0bc13311
Summary:
When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers
This number could be wrong after `torch.set_num_threads` call, so clean it after the call.

Fixes #53565

Pull Request resolved: #53871

Reviewed By: albanD

Differential Revision: D27003265

Pulled By: malfet

fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36
Co-authored-by: Joel Benjamin Schlosser <jbschlosser@fb.com>
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
@codecov
Copy link

codecov bot commented Mar 20, 2021

Codecov Report

Merging #51995 (b6f4980) into orig/release/1.8 (9112f4e) will increase coverage by 12.61%.
The diff coverage is 67.96%.

❗ Current head b6f4980 differs from pull request most recent head f3c950e. Consider uploading reports for the commit f3c950e to get more accurate results

@@                  Coverage Diff                  @@
##           orig/release/1.8   #51995       +/-   ##
=====================================================
+ Coverage             67.88%   80.49%   +12.61%     
=====================================================
  Files                  1790     1949      +159     
  Lines                181216   213390    +32174     
=====================================================
+ Hits                 123025   171776    +48751     
+ Misses                58191    41614    -16577     

mattip and others added 5 commits March 23, 2021 11:23
Summary:
Benchmark of
```python
%timeit torch.randperm(100000, device='cuda'); torch.cuda.synchronize()
```
thrust:
```
5.76 ms ± 42.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
cub:
```
3.02 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

sync in thrust sort is removed

Warning:
Thrust supports 64bit indexing, but cub doesn't, so this is a functional regression. However, `torch.randperm(2**31, device='cuda')` fails with OOM on 40GB A100, and `torch.randperm(2**32, device='cuda')` fails with OOM on 80GB A100, so I think this functional regression has low impact and is acceptable.

Pull Request resolved: #53841

Reviewed By: albanD

Differential Revision: D26993453

Pulled By: ngimel

fbshipit-source-id: 39dd128559d53dbb01cab1585e5462cb5f3cceca

Co-authored-by: Xiang Gao <qasdfgtyuiop@gmail.com>
Some users who are building from source on old glibc versions are hitting the issue of TensorPipe using the process_vm_readv syscall which is not wrapped by glibc. This PR tries to check that condition in CMake and disable that backend in such cases.

This should have no effect on PyTorch's official builds, it should just help people who are building from source.
* [CI]Install older cmath during Windows build (#54431)

Summary:
Based on peterjc123 analysis, `cmath` after microsoft/STL@26bbe2a#diff-3fa97ceb95d524432661f01d4b34509c6d261a2f7f45ddcf26f79f55b3eec88a renders a lot of CUDA fail to compile with:
```
error: calling a __host__ function("__copysignf") from a __host__ __device__ function("c10::guts::detail::apply_impl< ::at::native::AUnaryFunctor< ::>  &,     ::std::tuple<float >  &, (unsigned long long)0ull > ") is not allowed
```
Workaround for #54382

Pull Request resolved: #54431

Reviewed By: anjali411

Differential Revision: D27234299

Pulled By: malfet

fbshipit-source-id: b3f1fef941341222cc10cb27346fcf4a1d522a0c

* [CI] Install compatible cmath for Win binary builds (#54527)

Summary: Pull Request resolved: #54527

Reviewed By: walterddr

Differential Revision: D27269528

Pulled By: malfet

fbshipit-source-id: 4afdc706598f3a6ad296468dfb77a70433ae7d0f
…ad. (#53929) (#54358)

Summary:
Pull Request resolved: #53929

The local autograd engine performs appropriate stream synchronization
between autograd nodes in the graph to ensure a consumer's stream is
synchronized with the producer's stream before executing the consumer.

However in case of distributed autograd, the SendRpcBackward function receives
gradients over the wire and TensorPipe uses its own pool of streams for this
purpose. As a result, the tensors are received on TensorPipe's stream pool but
SendRpcBackward runs on a different stream during the backward pass and there
is no logic to synchronize these streams.

To fix this, I've enhanced DistEngine to synchronize these streams
appropriately when it receives grads over the wire.
ghstack-source-id: 124055277

(Note: this ignores all push blocking failures!)

Test Plan:
1) Added unit test which reproduced the issue.
2) waitforbuildbot.

Reviewed By: walterddr, wanchaol

Differential Revision: D27025307

fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414

Co-authored-by: Pritam Damania <pritam.damania@fb.com>
@seemethere seemethere closed this Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.