Removing cuda 102 #6649

atalman · 2022-09-26T17:20:11Z

In preparation for Release 1.13 removing CUDA 10.2

Follow up work required after this is merged:
#6655

.circleci/regenerate.py

malfet

This looks much more than just removing 102

As we have no plans of yanking manylinux-cuda102 images from dockerhub, perhaps we can keep them as defaults for CPU builds (but investigate perhaps in separate PRs, why CPU docker image could not be used)

Also, I though you've already moved cmake tests to cuda116 in another PR haven't you? If this has not been done yet, again, can it be separate so that we have a clear signal that move did not cause any regressions.

.circleci/config.yml.in

.circleci/regenerate.py

datumbox · 2022-09-26T18:27:42Z

The failing prototype tests are unrelated and PyTorch Data is looking into it.

The problem at unittest_linux_gpu_py3.8 is related:

RuntimeError: The current installed version of g++ (9.3.1) is greater than the maximum required version by CUDA 10.2 (8.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=8.0.0).

Possibly related to @atalman's #6649 (comment)

atalman · 2022-09-26T18:38:18Z

manylinux-cuda116

The failing prototype tests are unrelated and PyTorch Data is looking into it.

The problem at unittest_linux_gpu_py3.8 is related:
RuntimeError: The current installed version of g++ (9.3.1) is greater than the maximum required version by CUDA 10.2 (8.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=8.0.0).
Possibly related to @atalman's #6649 (comment)

Yes looks like for some reason unittest_linux_gpu_py3.8 CU_VERSION env var is set to cu102. Trying to find out why now.

.circleci/unittest/linux/scripts/install.sh

malfet · 2022-09-26T23:16:25Z

.circleci/unittest/linux/scripts/install.sh

@@ -37,7 +33,8 @@ printf "Installing PyTorch with %s\n" "${cudatoolkit}"
 if [ "${os}" == "MacOSX" ]; then
    conda install -y -c "pytorch-${UPLOAD_CHANNEL}" "pytorch-${UPLOAD_CHANNEL}"::pytorch "${cudatoolkit}"
 else
-    conda install -y -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia "pytorch-${UPLOAD_CHANNEL}"::pytorch[build="*${version}*"] "${cudatoolkit}"
+    printf "conda install -y pytorch ${cudatoolkit} -c pytorch-${UPLOAD_CHANNEL} -c nvidia"
+    conda install -y  pytorch "${cudatoolkit}" -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia


This does not seem to be equivalent to the previous statement (i.e. it remove exact pytorch version contraint)

I don't think we need it here honestly. I think we better stick to simpler solution same as get started page. Would be easier to maintain. ${version} here is just cuda version.

Same concern here. @atalman are you saying you tested and it's unnecessary?

yes totally old install command would yield:

conda install -y -c pytorch-nightly -c nvidia -c pytorch-nightly::pytorch[build="*cu117*"] pytorch-cuda=11.7

new one is the same as in our get started page:

conda install pytorch pytorch-cuda=11.7 -c pytorch-nightly -c nvidia

I would insist that we should keep old command (you can submit a BE change separately)
New commands opens up regression we've seen in torchaudio yesterday, when CPU version were installed instead of cuda one

This is unit test if it fails means we have real problem with pytorch binary

.circleci/regenerate.py

datumbox

From my side, with my limited knowledge of the building scripts it looks good. Just a question below.

I think we could consider merging it after #6660 and after @malfet unblocks. We should aim for all CI to be green prior merging it, to avoid accidental breakges.

datumbox · 2022-09-28T12:59:18Z

.circleci/unittest/linux/scripts/install.sh

@@ -37,7 +33,8 @@ printf "Installing PyTorch with %s\n" "${cudatoolkit}"
 if [ "${os}" == "MacOSX" ]; then
    conda install -y -c "pytorch-${UPLOAD_CHANNEL}" "pytorch-${UPLOAD_CHANNEL}"::pytorch "${cudatoolkit}"
 else
-    conda install -y -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia "pytorch-${UPLOAD_CHANNEL}"::pytorch[build="*${version}*"] "${cudatoolkit}"
+    printf "conda install -y pytorch ${cudatoolkit} -c pytorch-${UPLOAD_CHANNEL} -c nvidia"
+    conda install -y  pytorch "${cudatoolkit}" -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia


Same concern here. @atalman are you saying you tested and it's unnecessary?

malfet

LGTM

datumbox · 2022-09-28T14:38:28Z

@atalman @malfet Thanks for the changes.

Note that we have a couple of issues with flaky tests these days. A couple of models are throwing OOM errors on GPU. We suspect it's related to CircleCI and it comes and goes. If you see such an issue, you can safely ignore it. We are monitoring it to see how often and how regularly it breaks and we will decide to disable it if necessary.

Display cuda info Address comments try to resolve CUDA version issue More work Base debugging Fix cuda version passing Testing Adding config.yml Adding command we use for pytorch vision install Adding unit tests Modify install command Refactor config.in Move cpu tests to different PR Remove debug code Testing similar exception for linux as windows update test_models.py Revert "Testing similar exception for linux as windows" This reverts commit 4aaee0b. Revert "update test_models.py" This reverts commit 413651a.

github-actions · 2022-09-28T16:48:44Z

Hey @atalman!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * Removing cuda 102 Display cuda info Address comments try to resolve CUDA version issue More work Base debugging Fix cuda version passing Testing Adding config.yml Adding command we use for pytorch vision install Adding unit tests Modify install command Refactor config.in Move cpu tests to different PR Remove debug code Testing similar exception for linux as windows update test_models.py Revert "Testing similar exception for linux as windows" This reverts commit 4aaee0b. Revert "update test_models.py" This reverts commit 413651a. * Removing debug statement * Reverting to old command Reviewed By: datumbox Differential Revision: D40138739 fbshipit-source-id: 8ebe2de9a5fedb0da906825929599e44e9cb0207

facebook-github-bot added the cla signed label Sep 26, 2022

atalman requested review from datumbox, NicolasHug and malfet September 26, 2022 17:20

atalman commented Sep 26, 2022

View reviewed changes

.circleci/regenerate.py Show resolved Hide resolved

malfet requested changes Sep 26, 2022

View reviewed changes

.circleci/config.yml.in Outdated Show resolved Hide resolved

.circleci/config.yml.in Outdated Show resolved Hide resolved

.circleci/regenerate.py Outdated Show resolved Hide resolved

atalman requested a review from malfet September 26, 2022 22:16

malfet reviewed Sep 26, 2022

View reviewed changes

atalman mentioned this pull request Sep 27, 2022

fasterrcnn_resnet50_fpn Linux GPU tests failing on CUDA 11.6 #6655

Closed

datumbox mentioned this pull request Sep 28, 2022

Temporarily disable the autocast test for fasterrcnn_resnet50_fpn #6660

Merged

atalman force-pushed the remove_cu102 branch from 413651a to 4ee0a2a Compare September 28, 2022 12:48

datumbox approved these changes Sep 28, 2022

View reviewed changes

atalman force-pushed the remove_cu102 branch from b6b4221 to 9a894c1 Compare September 28, 2022 13:34

malfet approved these changes Sep 28, 2022

View reviewed changes

atalman added 3 commits September 28, 2022 08:28

Removing debug statement

3136700

Reverting to old command

0abc003

atalman force-pushed the remove_cu102 branch from 74d478e to 0abc003 Compare September 28, 2022 15:28

atalman merged commit dc07ac2 into pytorch:main Sep 28, 2022

datumbox added topic: build other if you have no clue or if you will manually handle the PR in the release notes labels Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing cuda 102 #6649

Removing cuda 102 #6649

atalman commented Sep 26, 2022 •

edited

Loading

malfet left a comment

datumbox commented Sep 26, 2022

atalman commented Sep 26, 2022

malfet Sep 26, 2022

atalman Sep 27, 2022 •

edited

Loading

datumbox Sep 28, 2022

atalman Sep 28, 2022

malfet Sep 28, 2022

atalman Sep 28, 2022

datumbox left a comment •

edited

Loading

datumbox Sep 28, 2022

malfet left a comment

datumbox commented Sep 28, 2022

github-actions bot commented Sep 28, 2022

Removing cuda 102 #6649

Removing cuda 102 #6649

Conversation

atalman commented Sep 26, 2022 • edited Loading

malfet left a comment

Choose a reason for hiding this comment

datumbox commented Sep 26, 2022

atalman commented Sep 26, 2022

malfet Sep 26, 2022

Choose a reason for hiding this comment

atalman Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

datumbox Sep 28, 2022

Choose a reason for hiding this comment

atalman Sep 28, 2022

Choose a reason for hiding this comment

malfet Sep 28, 2022

Choose a reason for hiding this comment

atalman Sep 28, 2022

Choose a reason for hiding this comment

datumbox left a comment • edited Loading

Choose a reason for hiding this comment

datumbox Sep 28, 2022

Choose a reason for hiding this comment

malfet left a comment

Choose a reason for hiding this comment

datumbox commented Sep 28, 2022

github-actions bot commented Sep 28, 2022

atalman commented Sep 26, 2022 •

edited

Loading

atalman Sep 27, 2022 •

edited

Loading

datumbox left a comment •

edited

Loading