-
Notifications
You must be signed in to change notification settings - Fork 23.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds DLPack support #57110
Adds DLPack support #57110
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 2059638 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
It shouldn't require a public object I think - I suspect that'd be a bigger discussion (not sure though, @mruberry?). Looking at the CuPy version, I think you want a private version (implemented in C++ and with Python bindings so you can use it in class ExternalStream(BaseStream):
"""CUDA stream.
This class allows to use external streams in CuPy by providing the
stream pointer obtained from the CUDA runtime call.
The user is in charge of managing the life-cycle of the stream.
Args:
ptr (intptr_t): Address of the `cudaStream_t` object.
Attributes:
~Stream.ptr (intptr_t): Raw stream handle.
"""
def __init__(self, ptr):
self.ptr = ptr Is that about what you were thinking? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emcastillo, looks like a great start.
torch/_tensor.py
Outdated
a `synchronize` method. Optional. | ||
""" | ||
if isinstance(stream, torch.cuda.Stream) or hasattr(stream, 'synchronize'): | ||
stream.synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary, but maybe the right API is missing at the Python level here to do this asynchronously? From dmlc/dlpack#57 (comment):
"In cases where both sides uses their own stream, async exchange can still be done by stream dependency queing":
// event can also be created on the fly, or create a synchronizer object and cache it.
// We could build auxiliary function that can be called from python side if that helps the frameworks
void PushStreamDep(cudaStream_t src, cudaStream dst) {
cudaEvent_t event;
cudaEventCreate(&event);
cudaEventRecord(&event ,src);
cudaStreamWaitForEvent(dst, event);
cudaEventDestroy(&event);
}
Regarding the |
Keeping things private to start, if possible, is always preferable as it gives us more flexibility in the future. If there's a compelling reason to make it public we can always do so, of course, but you'll have to educate me ;) |
Summary: This is required in #57110 (comment) We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: #57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91
Summary: This is required in pytorch#57110 (comment) We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api. cc mruberry rgommers leofang asi1024 kmaehashi Pull Request resolved: pytorch#57781 Reviewed By: mrshenli Differential Revision: D28326365 Pulled By: ezyang fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91
1a023e7
to
94e4cfd
Compare
@rgommers I updated the PR after #59527 was merged :). Thank you! |
94e4cfd
to
742de87
Compare
@kmaehashi and I would like to raise this discussion: as demonstrated in @emcastillo's current design, |
Thanks for bringing this up @leofang. This seems fine to me, since it's a superset of what's in the array API standard, so there's no conflict. Most libraries will support a superset for other functionality as well, for historical or other reasons. I would recommend that if this is done, the documentation emphasizes that the capsule approach is there only for convenience to support libraries that support the old-style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emcastillo.
I also tried to implement the two streams synchronization but did it python side instead of C++.
I didn't think that increasing the libtorch API with a function that would be used only in this case was worth it if it could be done python side.
I agree, this makes sense. Having ExternalStream
available makes this change quite nice and small.
# CPU = 1 CPU_PINNED = 3 OPENCL = 4 VULKAN = 7 | ||
# METAL = 8 VPI = 9 | ||
dlpack_ids = {'cpu': 1, 'cuda': 2, 'rocm': 10} | ||
idx = self.device.index if self.device.index is not None else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still has TODO's. I think it would be nice if this returned Tuple[enum.IntEnum, int]
as in the spec: https://data-apis.org/array-api/latest/API_specification/array_object.html#dlpack-device-self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit out-of-scope but if we were to support these other devices, how would the stream support work?
Should it be ignored in environments where a stream does not make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @rgommers meant is to change the return type of this function:
def __dlpack_device__(self) -> Tuple[enum.IntEnum, int]:
This is a bit out-of-scope but if we were to support these other devices, how would the stream support work?
Should it be ignored in environments where a stream does not make any sense?
For __dlpack_device__
whether a device has the concept of stream/queue doesn't matter. For __dlpack__
stream can be Any
:
https://data-apis.org/array-api/latest/API_specification/array_object.html#dlpack-self-stream-none
742de87
to
e2cea21
Compare
torch.tensor
torch.tensor
I think I addressed all the review concerns, also added some small tests to verify the behavior. |
There's a bunch of test failures. Not sure about the For the if has_torch_function_unary(self):
return handle_torch_function(Tensor.__dlpack__, (self,), self, stream) One other thing to discuss: |
379e3ac
to
9764886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @emcastillo!
Thank you all for your advice and for taking the time to thoroughly review my horrible initial implementation 😅 |
Doh! Sorry, @emcastillo, looks like this pick up a merge conflict. Would just rebase it and ping me so I can merge it? |
7061a69
to
d79d9dd
Compare
@mruberry rebased! |
d79d9dd
to
2059638
Compare
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -759,6 +759,8 @@ def compiled_with_cxx11_abi(): | |||
quantized_lstm = torch.ops.aten.quantized_lstm | |||
quantized_gru = torch.ops.aten.quantized_gru | |||
|
|||
from torch.utils.dlpack import from_dlpack, to_dlpack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some docs for DLPack in gh-70437. I'd forgotten about this. Looked back at my review comment, and it mentioned just from_dlpack
. I think adding to_dlpack
to the main namespace was probably unnecessary, given that it shouldn't be used when all libraries support __dlpack__
. Anyway, no action needed - mostly a comment to self.
pytorch/pytorch#57110 Co-authored-by: Leo Fang <leo80042@gmail.com>
Partially Fixes #55090
Depends on #55365
Inspired by dmlc/dlpack#57 (comment)
Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an
ExternalStream
object like the one we have in CuPy?TODO: Add tests
Would like some feedback as this design needs quite a few iterations
@rgommers @leofang
cc @mruberry @rgommers @pmeier @asmeurer @leofang @AnirudhDagar @asi1024 @emcastillo @kmaehashi @heitorschueroff