-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing cuda 102 #6649
Removing cuda 102 #6649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much more than just removing 102
As we have no plans of yanking manylinux-cuda102 images from dockerhub, perhaps we can keep them as defaults for CPU builds (but investigate perhaps in separate PRs, why CPU docker image could not be used)
Also, I though you've already moved cmake tests to cuda116 in another PR haven't you? If this has not been done yet, again, can it be separate so that we have a clear signal that move did not cause any regressions.
The failing prototype tests are unrelated and PyTorch Data is looking into it. The problem at unittest_linux_gpu_py3.8 is related:
Possibly related to @atalman's #6649 (comment) |
Yes looks like for some reason unittest_linux_gpu_py3.8 CU_VERSION env var is set to cu102. Trying to find out why now. |
@@ -37,7 +33,8 @@ printf "Installing PyTorch with %s\n" "${cudatoolkit}" | |||
if [ "${os}" == "MacOSX" ]; then | |||
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" "pytorch-${UPLOAD_CHANNEL}"::pytorch "${cudatoolkit}" | |||
else | |||
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia "pytorch-${UPLOAD_CHANNEL}"::pytorch[build="*${version}*"] "${cudatoolkit}" | |||
printf "conda install -y pytorch ${cudatoolkit} -c pytorch-${UPLOAD_CHANNEL} -c nvidia" | |||
conda install -y pytorch "${cudatoolkit}" -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to be equivalent to the previous statement (i.e. it remove exact pytorch version contraint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need it here honestly. I think we better stick to simpler solution same as get started page. Would be easier to maintain. ${version} here is just cuda version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here. @atalman are you saying you tested and it's unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes totally old install command would yield:
conda install -y -c pytorch-nightly -c nvidia -c pytorch-nightly::pytorch[build="*cu117*"] pytorch-cuda=11.7
new one is the same as in our get started page:
conda install pytorch pytorch-cuda=11.7 -c pytorch-nightly -c nvidia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would insist that we should keep old command (you can submit a BE change separately)
New commands opens up regression we've seen in torchaudio yesterday, when CPU version were installed instead of cuda one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unit test if it fails means we have real problem with pytorch binary
413651a
to
4ee0a2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -37,7 +33,8 @@ printf "Installing PyTorch with %s\n" "${cudatoolkit}" | |||
if [ "${os}" == "MacOSX" ]; then | |||
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" "pytorch-${UPLOAD_CHANNEL}"::pytorch "${cudatoolkit}" | |||
else | |||
conda install -y -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia "pytorch-${UPLOAD_CHANNEL}"::pytorch[build="*${version}*"] "${cudatoolkit}" | |||
printf "conda install -y pytorch ${cudatoolkit} -c pytorch-${UPLOAD_CHANNEL} -c nvidia" | |||
conda install -y pytorch "${cudatoolkit}" -c "pytorch-${UPLOAD_CHANNEL}" -c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern here. @atalman are you saying you tested and it's unnecessary?
b6b4221
to
9a894c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@atalman @malfet Thanks for the changes. Note that we have a couple of issues with flaky tests these days. A couple of models are throwing OOM errors on GPU. We suspect it's related to CircleCI and it comes and goes. If you see such an issue, you can safely ignore it. We are monitoring it to see how often and how regularly it breaks and we will decide to disable it if necessary. |
Display cuda info Address comments try to resolve CUDA version issue More work Base debugging Fix cuda version passing Testing Adding config.yml Adding command we use for pytorch vision install Adding unit tests Modify install command Refactor config.in Move cpu tests to different PR Remove debug code Testing similar exception for linux as windows update test_models.py Revert "Testing similar exception for linux as windows" This reverts commit 4aaee0b. Revert "update test_models.py" This reverts commit 413651a.
74d478e
to
0abc003
Compare
Hey @atalman! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: * Removing cuda 102 Display cuda info Address comments try to resolve CUDA version issue More work Base debugging Fix cuda version passing Testing Adding config.yml Adding command we use for pytorch vision install Adding unit tests Modify install command Refactor config.in Move cpu tests to different PR Remove debug code Testing similar exception for linux as windows update test_models.py Revert "Testing similar exception for linux as windows" This reverts commit 4aaee0b. Revert "update test_models.py" This reverts commit 413651a. * Removing debug statement * Reverting to old command Reviewed By: datumbox Differential Revision: D40138739 fbshipit-source-id: 8ebe2de9a5fedb0da906825929599e44e9cb0207
In preparation for Release 1.13 removing CUDA 10.2
Follow up work required after this is merged:
#6655