Description
🐛 Bug
A Caffe2 operation in PyTorch does not respect the current CUDA stream. An operation switches the current CUDA stream with its own CUDA stream. As results:
- All the CUDA kernels since the Caffe2 operation are executed on the switched CUDA stream.
- A Caffe2 operation would provide a corrupted result nondeterministically if it depends on a tensor from the original CUDA stream in PyTorch.
To Reproduce
Steps to reproduce the behavior:
- Check the current CUDA stream in PyTorch.
- Call any Caffe2 operation in PyTorch.
- Check the current CUDA stream in PyTorch again.
# The current CUDA stream is the default stream.
>>> torch.cuda.current_stream()
<torch.cuda.Stream device=cuda:0 cuda_stream=0x0>
>>> x = torch.rand(1).cuda()
>>> torch.ops._caffe2.AliasWithName(x, 'x')
tensor([0.8422], device='cuda:0')
# After a Caffe2 operation, the current CUDA stream has been changed.
>>> torch.cuda.current_stream()
<torch.cuda.Stream device=cuda:0 cuda_stream=0x561aad064c40>
Expected behavior
The second CUDA stream should be the same as the first CUDA stream.
Environment
I reproduced this issue on both PyTorch 1.5.0 and 1.4.0.
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 418.116.00
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.5.0
[pip] torchvision==0.6.0a0+82fd1c8
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h6bb024c_0
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py37he904b0f_0
[conda] mkl_fft 1.0.15 py37ha843d7b_0
[conda] mkl_random 1.1.0 py37hd6b4f25_0
[conda] numpy 1.18.1 py37h4f9e942_0
[conda] numpy-base 1.18.1 py37hde5b4d6_1
[conda] pytorch 1.5.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] torchvision 0.6.0 py37_cu101 pytorch
Additional context
I found this issue when I'm trying to deploy Faster R-CNN from Detectron2. The first output of the deployed model is nondeterministically different with the latter outputs.
model = torch.jit.load('model.ts')
y1 = model(x)
y2 = model(x)
y3 = model(x)
y1 != y2
y2 == y3
The model calls torch.ops._caffe2.GenerateProposals()
internally (code). I profiled this function call and the depending function call rpn_head()
. Then I could see the unexpected CUDA stream switch at GenerateProposals
:
One of possible workarounds is to call an arbitrary Caffe2 operation at the beginning of the model:
def forward(self, input):
torch.ops._caffe2.AliasWithName(input, 'input')
return self.original_model(input)
cc @ngimel
Activity
glaringlee commentedon Jun 15, 2020
@sublee
Can you switch to pytorch apis? Caffe2 is being deprecated and migrating into pytorch codebase.
sublee commentedon Jun 16, 2020
Thanks for the reply. Because of to use Caffe2 is a choice of Detectron2 (I don't understand "why"), I guess it's hard to switch to proper PyTorch APIs. If this issue is a Caffe2+PyTorch bug but it won't be fixed, I would fix my case by the workaround I introduced.