Caffe2 operation switches current CUDA stream

## 🐛 Bug

A Caffe2 operation in PyTorch does not respect the current CUDA stream. An operation switches the current CUDA stream with its own CUDA stream. As results:

- All the CUDA kernels since the Caffe2 operation are executed on the switched CUDA stream.
- A Caffe2 operation would provide a corrupted result nondeterministically if it depends on a tensor from the original CUDA stream in PyTorch.

## To Reproduce

Steps to reproduce the behavior:

1. Check the current CUDA stream in PyTorch.
2. Call any Caffe2 operation in PyTorch.
3. Check the current CUDA stream in PyTorch again.

```python
# The current CUDA stream is the default stream.
>>> torch.cuda.current_stream()
<torch.cuda.Stream device=cuda:0 cuda_stream=0x0>

>>> x = torch.rand(1).cuda()
>>> torch.ops._caffe2.AliasWithName(x, 'x')
tensor([0.8422], device='cuda:0')

# After a Caffe2 operation, the current CUDA stream has been changed.
>>> torch.cuda.current_stream()
<torch.cuda.Stream device=cuda:0 cuda_stream=0x561aad064c40>
```

## Expected behavior

The second CUDA stream should be the same as the first CUDA stream.

## Environment

I reproduced this issue on both PyTorch 1.5.0 and 1.4.0.

```
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 418.116.00
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.5.0
[pip] torchvision==0.6.0a0+82fd1c8
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] mkl                       2020.0                      166
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] numpy                     1.18.1           py37h4f9e942_0
[conda] numpy-base                1.18.1           py37hde5b4d6_1
[conda] pytorch                   1.5.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.6.0                py37_cu101    pytorch
```

## Additional context

I found this issue when I'm trying to deploy Faster R-CNN from [Detectron2](https://github.com/facebookresearch/detectron2). The first output of the deployed model is nondeterministically different with the latter outputs.

```python
model = torch.jit.load('model.ts')
y1 = model(x)
y2 = model(x)
y3 = model(x)
y1 != y2
y2 == y3
```

The model calls `torch.ops._caffe2.GenerateProposals()` internally ([code](https://github.com/facebookresearch/detectron2/blob/400779d3340aad9bace00c1128934762f63bda6b/detectron2/export/c10.py#L187-L203)). I profiled this function call and the depending function call `rpn_head()`. Then I could see the unexpected CUDA stream switch at `GenerateProposals`:

![image](https://user-images.githubusercontent.com/19982/84636752-e0ea5a00-af2f-11ea-922c-8204967d5b9f.png)

One of possible workarounds is to call an arbitrary Caffe2 operation at the beginning of the model:

```python
def forward(self, input):
    torch.ops._caffe2.AliasWithName(input, 'input')
    return self.original_model(input)
```

cc @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caffe2 operation switches current CUDA stream #40020

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Caffe2 operation switches current CUDA stream #40020

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Activity

glaringlee commented on Jun 15, 2020

sublee commented on Jun 16, 2020

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions