Adds DLPack support #57110

emcastillo · 2021-04-28T08:46:18Z

Partially Fixes #55090
Depends on #55365

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an ExternalStream object like the one we have in CuPy?

TODO: Add tests

Would like some feedback as this design needs quite a few iterations
@rgommers @leofang

cc @mruberry @rgommers @pmeier @asmeurer @leofang @AnirudhDagar @asi1024 @emcastillo @kmaehashi @heitorschueroff

facebook-github-bot · 2021-04-28T08:46:24Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/57110
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 2059638 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

rgommers · 2021-04-28T10:51:26Z

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an ExternalStream object like the one we have in CuPy?

It shouldn't require a public object I think - I suspect that'd be a bigger discussion (not sure though, @mruberry?).

Looking at the CuPy version, I think you want a private version (implemented in C++ and with Python bindings so you can use it in Tensor.__dlpack__) of:

class ExternalStream(BaseStream):

    """CUDA stream.
    This class allows to use external streams in CuPy by providing the
    stream pointer obtained from the CUDA runtime call.
    The user is in charge of managing the life-cycle of the stream.
    Args:
        ptr (intptr_t): Address of the `cudaStream_t` object.
    Attributes:
        ~Stream.ptr (intptr_t): Raw stream handle.
    """

    def __init__(self, ptr):
        self.ptr = ptr

Is that about what you were thinking?

rgommers

Thanks @emcastillo, looks like a great start.

torch/_tensor.py

rgommers · 2021-04-28T11:04:08Z

torch/_tensor.py

+                a `synchronize` method. Optional.
+        """
+        if isinstance(stream, torch.cuda.Stream) or hasattr(stream, 'synchronize'):
+            stream.synchronize()


This shouldn't be necessary, but maybe the right API is missing at the Python level here to do this asynchronously? From dmlc/dlpack#57 (comment):

"In cases where both sides uses their own stream, async exchange can still be done by stream dependency queing":

// event can also be created on the fly, or create a synchronizer object and cache it. // We could build auxiliary function that can be called from python side if that helps the frameworks void PushStreamDep(cudaStream_t src, cudaStream dst) { cudaEvent_t event; cudaEventCreate(&event); cudaEventRecord(&event ,src); cudaStreamWaitForEvent(dst, event); cudaEventDestroy(&event); }

torch/_tensor.py

torch/utils/dlpack.py

emcastillo · 2021-04-29T01:58:12Z

Regarding the ExternalStream, I will send a PR to implement it in PyTorch so we can use it here.
Thanks!

mruberry · 2021-05-02T04:42:57Z

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an ExternalStream object like the one we have in CuPy?

It shouldn't require a public object I think - I suspect that'd be a bigger discussion (not sure though, @mruberry?).

Keeping things private to start, if possible, is always preferable as it gives us more flexibility in the future. If there's a compelling reason to make it public we can always do so, of course, but you'll have to educate me ;)

Summary: This is required in #57110 (comment) We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: #57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91

Summary: This is required in pytorch#57110 (comment) We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: pytorch#57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91

emcastillo · 2021-06-17T07:50:08Z

@rgommers I updated the PR after #59527 was merged :).
I tried to address all your concerns, can I get a 2nd review, please?
I also tried to implement the two streams synchronization but did it python side instead of C++.
I didn't think that increasing the libtorch API with a function that would be used only in this case was worth it if it could be done python side.

Thank you!

leofang · 2021-06-18T20:42:24Z

@kmaehashi and I would like to raise this discussion: as demonstrated in @emcastillo's current design, from_dlpack() can support both pycapsule objects (for backward compatibility if a library has supported DLPack under this name) and any protocol-complaint object that comes with __dlpack__ and __dlpack_device__. The question is if this is OK to everyone.

rgommers · 2021-06-20T19:18:06Z

@kmaehashi and I would like to raise this discussion: as demonstrated in @emcastillo's current design, from_dlpack() can support both pycapsule objects (for backward compatibility if a library has supported DLPack under this name) and any protocol-complaint object that comes with __dlpack__ and __dlpack_device__. The question is if this is OK to everyone.

Thanks for bringing this up @leofang. This seems fine to me, since it's a superset of what's in the array API standard, so there's no conflict. Most libraries will support a superset for other functionality as well, for historical or other reasons.

I would recommend that if this is done, the documentation emphasizes that the capsule approach is there only for convenience to support libraries that support the old-style to_dlpack and not yet __dlpack__. Because the capsule approach is a bit less safe (capsule may not be consumed more than once), takes an extra line of code, and the stream parameter cannot be used.

rgommers

Thanks @emcastillo.

I also tried to implement the two streams synchronization but did it python side instead of C++.
I didn't think that increasing the libtorch API with a function that would be used only in this case was worth it if it could be done python side.

I agree, this makes sense. Having ExternalStream available makes this change quite nice and small.

torch/_tensor.py

rgommers · 2021-06-20T19:40:56Z

torch/_tensor.py

+        # CPU = 1 CPU_PINNED = 3 OPENCL = 4 VULKAN = 7
+        # METAL = 8 VPI = 9
+        dlpack_ids = {'cpu': 1, 'cuda': 2, 'rocm': 10}
+        idx = self.device.index if self.device.index is not None else 0


This still has TODO's. I think it would be nice if this returned Tuple[enum.IntEnum, int] as in the spec: https://data-apis.org/array-api/latest/API_specification/array_object.html#dlpack-device-self

This is a bit out-of-scope but if we were to support these other devices, how would the stream support work?
Should it be ignored in environments where a stream does not make any sense?

I think what @rgommers meant is to change the return type of this function:

def __dlpack_device__(self) -> Tuple[enum.IntEnum, int]:

This is a bit out-of-scope but if we were to support these other devices, how would the stream support work?
Should it be ignored in environments where a stream does not make any sense?

For __dlpack_device__ whether a device has the concept of stream/queue doesn't matter. For __dlpack__ stream can be Any:
https://data-apis.org/array-api/latest/API_specification/array_object.html#dlpack-self-stream-none

torch/utils/dlpack.py

emcastillo · 2021-06-21T07:35:31Z

I think I addressed all the review concerns, also added some small tests to verify the behavior.
Can I take another look?
This should be close to landing, but maybe some small fixes or tweaks might be required.

rgommers · 2021-06-21T15:27:40Z

There's a bunch of test failures. Not sure about the xla ones.

For the __torch_function__ ones, the dunder methods need a snippet like this at the top of the method:

        if has_torch_function_unary(self):
            return handle_torch_function(Tensor.__dlpack__, (self,), self, stream)

One other thing to discuss: from_dlpack is left in the torch.utils.dlpack namespace here, but according to the array API standard we'd want it in the main namespace. I think it fits there too, next to from_numpy and (soon) frombuffer.

rgommers

This looks great, thanks @emcastillo!

emcastillo · 2021-09-07T01:03:25Z

Thank you all for your advice and for taking the time to thoroughly review my horrible initial implementation 😅

mruberry · 2021-09-07T22:11:33Z

Doh! Sorry, @emcastillo, looks like this pick up a merge conflict. Would just rebase it and ping me so I can merge it?

emcastillo · 2021-09-08T04:52:16Z

@mruberry rebased!

facebook-github-bot · 2021-09-12T08:54:47Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-09-13T02:48:35Z

@mruberry merged this pull request in 1cb3507.

rgommers · 2022-01-03T16:04:52Z

torch/__init__.py

@@ -759,6 +759,8 @@ def compiled_with_cxx11_abi():
 quantized_lstm = torch.ops.aten.quantized_lstm
 quantized_gru = torch.ops.aten.quantized_gru

+from torch.utils.dlpack import from_dlpack, to_dlpack


Adding some docs for DLPack in gh-70437. I'd forgotten about this. Looked back at my review comment, and it mentioned just from_dlpack. I think adding to_dlpack to the main namespace was probably unnecessary, given that it shouldn't be used when all libraries support __dlpack__. Anyway, no action needed - mostly a comment to self.

pytorch/pytorch#57110 Co-authored-by: Leo Fang <leo80042@gmail.com>

facebook-github-bot added the cla signed label Apr 28, 2021

pytorchbot added the open source label Apr 28, 2021

rgommers reviewed Apr 28, 2021

View reviewed changes

leofang mentioned this pull request Apr 30, 2021

Implement improved DLPack support #55090

Closed

YuchenJin mentioned this pull request May 6, 2021

[DLPACK] Support the new python array api with DLPack apache/tvm#7993

Merged

emcastillo mentioned this pull request May 6, 2021

Add torch.cuda.streams.ExternalStream #57781

Closed

hameerabbasi mentioned this pull request May 18, 2021

DLPack support for NumPy numpy/numpy#19013

Closed

leofang mentioned this pull request Jun 4, 2021

Support the new DLPack exchange protocol from Python Array API mpi4py/mpi4py#56

Closed

emcastillo force-pushed the add-py-dlpack branch from 1a023e7 to 94e4cfd Compare June 17, 2021 07:48

emcastillo force-pushed the add-py-dlpack branch from 94e4cfd to 742de87 Compare June 17, 2021 07:51

kmaehashi mentioned this pull request Jun 17, 2021

Support the new DLPack exchange protocol cupy/cupy#5306

Merged

3 tasks

rgommers added the module: python array api Issues related to the Python Array API label Jun 20, 2021

rgommers reviewed Jun 20, 2021

View reviewed changes

emcastillo force-pushed the add-py-dlpack branch from 742de87 to e2cea21 Compare June 21, 2021 07:34

emcastillo changed the title ~~[WIP] Add support for dlpack to torch.tensor~~ Add support for dlpack to torch.tensor Jun 21, 2021

anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 21, 2021

emcastillo force-pushed the add-py-dlpack branch 3 times, most recently from 379e3ac to 9764886 Compare June 22, 2021 00:53

rgommers approved these changes Sep 6, 2021

View reviewed changes

Emilio Castillo added 13 commits September 8, 2021 04:49

Add support for dlpack to torch.tensor

a72a579

Use ExternalStream

56af6f3

Review changes and test

aa107ef

More stream checks

6aaacd6

Review fixes

86dfedd

fix tests

68d35ef

fix tests

70bc04b

filter xla

df70f26

fixes

a73fb64

Enhancements

7134d76

lint

27b9639

fix-mypy

b827087

flake8 fix

8d60bbe

emcastillo force-pushed the add-py-dlpack branch from 7061a69 to d79d9dd Compare September 8, 2021 04:51

Review changes

2059638

emcastillo force-pushed the add-py-dlpack branch from d79d9dd to 2059638 Compare September 8, 2021 05:40

facebook-github-bot closed this in 1cb3507 Sep 13, 2021

facebook-github-bot added the Merged label Sep 13, 2021

nv-dlasalle mentioned this pull request Oct 8, 2021

[PyTorch][Bugfix] Use uint8 instead of bool in pytorch to be compatible with nightly version dmlc/dgl#3406

Merged

5 tasks

BarclayII mentioned this pull request Oct 22, 2021

DLPack no longer works on Boolean tensors after 1.10+ #67081

Closed

rgommers reviewed Jan 3, 2022

View reviewed changes

kmaehashi mentioned this pull request Nov 26, 2023

Use modern dlpack interface in torch interoperability document cupy/cupy#7988

Merged

take-cheeze added a commit to take-cheeze/cupy that referenced this pull request Nov 27, 2023

Use torch.from_dlpack instead

bc0ea7f

pytorch/pytorch#57110 Co-authored-by: Leo Fang <leo80042@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds DLPack support #57110

Adds DLPack support #57110

emcastillo commented Apr 28, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Apr 28, 2021 •

edited

Loading

rgommers commented Apr 28, 2021

rgommers left a comment

rgommers Apr 28, 2021

emcastillo commented Apr 29, 2021

mruberry commented May 2, 2021

emcastillo commented Jun 17, 2021

leofang commented Jun 18, 2021

rgommers commented Jun 20, 2021 •

edited

Loading

rgommers left a comment

rgommers Jun 20, 2021

emcastillo Jun 21, 2021

leofang Jun 21, 2021

emcastillo commented Jun 21, 2021

rgommers commented Jun 21, 2021 •

edited

Loading

rgommers left a comment

emcastillo commented Sep 7, 2021

mruberry commented Sep 7, 2021

emcastillo commented Sep 8, 2021

facebook-github-bot commented Sep 12, 2021

facebook-github-bot commented Sep 13, 2021

rgommers Jan 3, 2022

Adds DLPack support #57110

Adds DLPack support #57110

Conversation

emcastillo commented Apr 28, 2021 • edited by pytorch-probot bot Loading

facebook-github-bot commented Apr 28, 2021 • edited Loading

🔗 Helpful links

💊 CI failures summary and remediations

rgommers commented Apr 28, 2021

rgommers left a comment

Choose a reason for hiding this comment

rgommers Apr 28, 2021

Choose a reason for hiding this comment

emcastillo commented Apr 29, 2021

mruberry commented May 2, 2021

emcastillo commented Jun 17, 2021

leofang commented Jun 18, 2021

rgommers commented Jun 20, 2021 • edited Loading

rgommers left a comment

Choose a reason for hiding this comment

rgommers Jun 20, 2021

Choose a reason for hiding this comment

emcastillo Jun 21, 2021

Choose a reason for hiding this comment

leofang Jun 21, 2021

Choose a reason for hiding this comment

emcastillo commented Jun 21, 2021

rgommers commented Jun 21, 2021 • edited Loading

rgommers left a comment

Choose a reason for hiding this comment

emcastillo commented Sep 7, 2021

mruberry commented Sep 7, 2021

emcastillo commented Sep 8, 2021

facebook-github-bot commented Sep 12, 2021

facebook-github-bot commented Sep 13, 2021

rgommers Jan 3, 2022

Choose a reason for hiding this comment

emcastillo commented Apr 28, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Apr 28, 2021 •

edited

Loading

rgommers commented Jun 20, 2021 •

edited

Loading

rgommers commented Jun 21, 2021 •

edited

Loading