-
Notifications
You must be signed in to change notification settings - Fork 23.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.8.0] Getting signal for release/1.8
#51995
Conversation
Summary: Fixes #50695 I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel. Pull Request resolved: #51634 Reviewed By: zhangguanheng66 Differential Revision: D26225550 Pulled By: walterddr fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a Co-authored-by: mattip <matti.picus@gmail.com>
🔗 Helpful links
❌ 4 New FailuresAs of commit 56b43f4 (more details on the Dr. CI page): Expand to see more
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakagespytorch_windows_vs2019_py36_cuda11.1_build (1/3)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
Job | Step | Action |
---|---|---|
pytorch_macos_10_13_py3_test | Test | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
Summary: tries to fix doc_test Pull Request resolved: #51825 Reviewed By: bertmaher Differential Revision: D26295583 Pulled By: ngimel fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80 (cherry picked from commit 6c0bf28) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Natalia Gimelshein <ngimel@fb.com>
Summary: Fixes issue: #49728 ======== Ternary if operation fails in Torchscript when the condition variable is annotated as Final. Tests: ======= pytest -k test_ternary_static_if test/test_jit.py Pull Request resolved: #51789 Reviewed By: gmagogsfm Differential Revision: D26278969 Pulled By: nikithamalgifb fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e Co-authored-by: nikithamalgi <nikithamalgi@devvm146.prn0.facebook.com>
f2464dd
to
9e5bcc1
Compare
* Fix leaf modules in Transformer [ghstack-poisoned] * Fix tuple type annotations [ghstack-poisoned] * Generalize dict key check in `create-arg` (#51927) Summary: Pull Request resolved: #51927 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26329655 Pulled By: jamesr66a fbshipit-source-id: a15e7d9564551521af12a8fde1c7524856f0cbc2
Summary: Pull Request resolved: #51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97 (cherry picked from commit 33afb5f)
Summary: Move definition of copysign template and specialization for bfloat16/half types before first use of copysign in that file Add comment explaining why this is necessary Fixes #51889 Pull Request resolved: #51900 Reviewed By: walterddr Differential Revision: D26321741 Pulled By: malfet fbshipit-source-id: 888858b11d9708fa140fe9c0570cc5a24599205b
Summary: It frequently happens when PyTorch compiled with CUDA support is installed on machine that does not have NVIDIA GPUs. Fixes #47038 Pull Request resolved: #51806 Reviewed By: ezyang Differential Revision: D26285827 Pulled By: malfet fbshipit-source-id: 9fd5e690d0135a2b219c1afa803fb69de9729f5e
…ation hooks (#52215) Co-authored-by: wayi <wayi@devgpu238.prn2.facebook.com>
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
Summary: Pull Request resolved: #50180 Resolves the regression in #49819 by adding copy over background stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off. Test Plan: CI Reviewed By: mrshenli, ngimel Differential Revision: D25818170 fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75
torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the `torch.vmap` API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.
Summary: Necessary to ensure correct link order, especially if libraries are linked statically. Otherwise, one might run into: ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: #52243 Reviewed By: seemethere, ngimel Differential Revision: D26437159 Pulled By: malfet fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea
…#52406) Summary: Pull Request resolved: #52151 CUDA 11.2 might not be as performant as we thought so let's downgrade to something we think is more performant. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26408314 Pulled By: seemethere fbshipit-source-id: e2446aa0115e2c2a79718b1fdfd9fccf2072822d (cherry picked from commit a11650b) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Summary: Fixes #39502 This PR adds support for exporting **fake_quantize_per_channel_affine** to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR #39738. `axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by onnx/onnx#2772. [update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master. The function is also tested offline with the following code ```python import torch from torch import quantization from torchvision import models qat_resnet18 = models.resnet18(pretrained=True).eval().cuda() qat_resnet18.qconfig = quantization.QConfig( activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant) quantization.prepare_qat(qat_resnet18, inplace=True) qat_resnet18.apply(quantization.enable_observer) qat_resnet18.apply(quantization.enable_fake_quant) dummy_input = torch.randn(16, 3, 224, 224).cuda() _ = qat_resnet18(dummy_input) for module in qat_resnet18.modules(): if isinstance(module, quantization.FakeQuantize): module.calculate_qparams() qat_resnet18.apply(quantization.disable_observer) qat_resnet18.cuda() input_names = [ "actual_input_1" ] output_names = [ "output1" ] torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13) ``` It can generate the desired graph. Pull Request resolved: #42835 Reviewed By: houseroad Differential Revision: D26293823 Pulled By: SplitInfinity fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea Co-authored-by: Hao Wu <skyw@users.noreply.github.com>
Summary: This is getting tested by #52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: #52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e
Co-authored-by: eellison <eellison@fb.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: peterjc123 <peterghost86@gmail.com> Co-authored-by: Jane Xu <janeyx@fb.com>
…llLoss (#52510) Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com>
Summary: Fixes #{issue number} Pull Request resolved: #51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c Co-authored-by: Joe Zhu <jozh@microsoft.com>
* [FX] Cherrypick docs fixes * Update code links to point to 1.8
Summary: For enabling amp in torch/xla, see [this](pytorch/xla#2654). Pull Request resolved: #48570 Reviewed By: ezyang Differential Revision: D26120627 Pulled By: ailzhang fbshipit-source-id: 32627b17c04bfdad128624676ea9bf6f117bc97d Co-authored-by: Chengji Yao <yaochengji@hotmail.com>
…53675) Summary: For #47027. Some progress has been made in #50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest. This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright. Pull Request resolved: #52339 Reviewed By: smessmer Differential Revision: D26694909 Pulled By: malfet fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1 Co-authored-by: Jake Bailey <5341706+jakebailey@users.noreply.github.com>
Summary: Pull Request resolved: #53133 In light of some issues where users were having trouble installing CUDA specific versions of pytorch we should no longer have special privileges for CUDA 10.2. Recently I added scripts/release/promote/prep_binary_for_pypi.sh (#53056) to make it so that we could theoretically promote any wheel we publish to download.pytorch.org to pypi Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26759823 Pulled By: seemethere fbshipit-source-id: 2d2b29e7fef0f48c23f3c853bdca6144b7c61f22 (cherry picked from commit b8546bd) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Summary: Fixes #{51801} LSTMCell example updated Pull Request resolved: #51983 Reviewed By: agolynski Differential Revision: D26467104 Pulled By: zou3519 fbshipit-source-id: 31c8bf89b21cd2f748b2cc28a74169082d81503c Co-authored-by: CarlosJose126 <43588143+CarlosJose126@users.noreply.github.com>
* Add sample validation for LKJCholesky.log_prob * Fix distributions which don't properly honor validate_args=False A number of derived distributions use base distributions in their implementation. We add what we hope is a comprehensive test whether all distributions actually honor skipping validation of arguments in log_prob and then fix the bugs we found. These bugs are particularly cumbersome in PyTorch 1.8 and master when validate_args is turned on by default In addition one might argue that validate_args is not performing as well as it should when the default is not to validate but the validation is turned on in instantiation. Arguably, there is another set of bugs or at least inconsistencies when validation of inputs does not prevent invalid indices in sample validation (when with validation an IndexError is raised in the test). We would encourage the implementors to be more ambitious when validation is turned on and amend sample validation to throw a ValueError for consistency. * additional fixes to distributions * address failing tests Co-authored-by: neerajprad <neerajprad@devvm903.atn0.facebook.com> Co-authored-by: Thomas Viehmann <tv.code@beamnet.de>
- Support transferring >2GB over CMA - Avoid loading stub version of CUDA driver - Don't use unsupported mmap option on older kernels - Don't join non-existing thread if CMA is not viable The last two manifested as uncaught exceptions (hence crashes) when initializing RPC. The first one caused same-machine RPC requests to fail.
…scriptMethods (#53519) (#53548) (#54005) Summary: Pull Request resolved: #53548 fixes issue faced in #53506 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26922415 Pulled By: malfet fbshipit-source-id: b61842827bb14cef8c7a7089b2426fa53e642c90 Co-authored-by: BowenBao <bowbao@microsoft.com>
…53328) (#53529) (#54007) Summary: Pull Request resolved: #53529 Supported for ONNX export after opset 10. This is not exportable to opsets < 10 due to 1. onnx::IsInf is introduced in opset 10 2. onnx::Equal does not accept float tensor prior to opset 11 Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922418 Pulled By: SplitInfinity fbshipit-source-id: 69bcba50520fa3d69db4bd4c2b9f88c00146fca7 Co-authored-by: BowenBao <bowbao@microsoft.com>
Summary: Pull Request resolved: #52216 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26427506 Pulled By: ailzhang fbshipit-source-id: ba4f2f66794cb2843926e5566eb4d25582f7fb2b Co-authored-by: Ailing Zhang <ailzhang@fb.com>
Summary: Updating Kineto to include bugfixes for 1.8.1 Test Plan: CI
…= 24 * n (#54015) * Disabling dispatch to OneDNN for group convolutions when groups size is 24 * n * Add condition to non-zero grps Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>
Summary: Follow-up of #53447 Reference: #53447 (comment) Pull Request resolved: #53809 Reviewed By: bdhirsh Differential Revision: D27049643 Pulled By: jbschlosser fbshipit-source-id: 623a2a254783b86391dc2b0777b688506adb4c0e Co-authored-by: kshitij12345 <kshitijkalambarkar@gmail.com>
Summary: Since `char` is not guaranteed to be signed on all platforms (it is unsigned on ARM) Fixes #52146 Pull Request resolved: #52616 Test Plan: Run ` python3 -c "import torch;a=torch.tensor([-1], dtype=torch.int8);print(a.tolist())"` on arm-linux system Reviewed By: walterddr Differential Revision: D26586678 Pulled By: malfet fbshipit-source-id: 91972189b54f86add516ffb96d579acb0bc13311
Summary: When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers This number could be wrong after `torch.set_num_threads` call, so clean it after the call. Fixes #53565 Pull Request resolved: #53871 Reviewed By: albanD Differential Revision: D27003265 Pulled By: malfet fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36
Co-authored-by: Joel Benjamin Schlosser <jbschlosser@fb.com>
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Codecov Report
@@ Coverage Diff @@
## orig/release/1.8 #51995 +/- ##
=====================================================
+ Coverage 67.88% 80.49% +12.61%
=====================================================
Files 1790 1949 +159
Lines 181216 213390 +32174
=====================================================
+ Hits 123025 171776 +48751
+ Misses 58191 41614 -16577 |
Summary: Benchmark of ```python %timeit torch.randperm(100000, device='cuda'); torch.cuda.synchronize() ``` thrust: ``` 5.76 ms ± 42.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cub: ``` 3.02 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` sync in thrust sort is removed Warning: Thrust supports 64bit indexing, but cub doesn't, so this is a functional regression. However, `torch.randperm(2**31, device='cuda')` fails with OOM on 40GB A100, and `torch.randperm(2**32, device='cuda')` fails with OOM on 80GB A100, so I think this functional regression has low impact and is acceptable. Pull Request resolved: #53841 Reviewed By: albanD Differential Revision: D26993453 Pulled By: ngimel fbshipit-source-id: 39dd128559d53dbb01cab1585e5462cb5f3cceca Co-authored-by: Xiang Gao <qasdfgtyuiop@gmail.com>
Some users who are building from source on old glibc versions are hitting the issue of TensorPipe using the process_vm_readv syscall which is not wrapped by glibc. This PR tries to check that condition in CMake and disable that backend in such cases. This should have no effect on PyTorch's official builds, it should just help people who are building from source.
* [CI]Install older cmath during Windows build (#54431) Summary: Based on peterjc123 analysis, `cmath` after microsoft/STL@26bbe2a#diff-3fa97ceb95d524432661f01d4b34509c6d261a2f7f45ddcf26f79f55b3eec88a renders a lot of CUDA fail to compile with: ``` error: calling a __host__ function("__copysignf") from a __host__ __device__ function("c10::guts::detail::apply_impl< ::at::native::AUnaryFunctor< ::> &, ::std::tuple<float > &, (unsigned long long)0ull > ") is not allowed ``` Workaround for #54382 Pull Request resolved: #54431 Reviewed By: anjali411 Differential Revision: D27234299 Pulled By: malfet fbshipit-source-id: b3f1fef941341222cc10cb27346fcf4a1d522a0c * [CI] Install compatible cmath for Win binary builds (#54527) Summary: Pull Request resolved: #54527 Reviewed By: walterddr Differential Revision: D27269528 Pulled By: malfet fbshipit-source-id: 4afdc706598f3a6ad296468dfb77a70433ae7d0f
…ad. (#53929) (#54358) Summary: Pull Request resolved: #53929 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 124055277 (Note: this ignores all push blocking failures!) Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: walterddr, wanchaol Differential Revision: D27025307 fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414 Co-authored-by: Pritam Damania <pritam.damania@fb.com>
No description provided.