Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43541: [C++] Check accepted device allocation types before executing kernel #43542

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Aug 2, 2024

Rationale for this change

Kernels shouldn't segfault when executing against GPU-allocated (or any other non-CPU device) Arrays.

What changes are included in this PR?

  • A way of declaring which device allocation types a kernel can handle per parameter
  • Removal of some of the checks from the Python code
  • Making drop_null callable when array is CUDA-allocated

Are these changes tested?

Tested from the Python tests.

Are there any user-facing changes?

New member functions to some classes.

Copy link

github-actions bot commented Aug 2, 2024

⚠️ GitHub issue #43541 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Aug 2, 2024
Comment on lines +438 to +440
case Datum::RECORD_BATCH:
case Datum::TABLE:
Copy link
Contributor Author

@felipecrv felipecrv Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably handle these even though they are not passed here.

@@ -430,6 +431,25 @@ bool InputType::Matches(const Datum& value) const {
return Matches(*value.type());
}

bool InputType::MatchesDeviceAllocationType(const Datum& value) const {
DCHECK(Matches(value));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably remove this DCHECK and keep the ones in KernelSignature::MatchesDeviceAllocationTypes.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 2, 2024
@felipecrv felipecrv marked this pull request as ready for review August 27, 2024 00:32
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 27, 2024
@felipecrv felipecrv requested a review from danepitkin August 27, 2024 00:35
@felipecrv
Copy link
Contributor Author

I know what is wrong with drop_null (failing in tests), but I'm still deciding the best way to fix it.

@@ -1646,7 +1630,6 @@ cdef class Array(_PandasConvertible):
-------
lst : list
"""
self._assert_cpu()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this one to not change (not using compute kernels under the hood)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled by __iter__ already. Isn't that what's called in the for x in self sugar?

@felipecrv felipecrv requested a review from bkietz August 27, 2024 13:54
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good but I'm not convinced about the changes to InputType; I don't know of any kernels which take non-CPU data so it seems more reasonable to leave the InputType refactor for a follow up. On the other hand, Users may have implemented their own kernels which do operate on non CPU data in which case this change to InputType is breaking (since those kernels were not defined with the new constructor and default to gracefully rejecting non-CPU data). I'll have to think on this some more

cpp/src/arrow/device_allocation_type.h Outdated Show resolved Hide resolved
Comment on lines 61 to 69
for (int i = 1; i <= kDeviceAllocationTypeMax; i++) {
if (device_type_bitset_.test(i)) {
// Skip all the unused values in the enum.
switch (i) {
case 0:
case 5:
case 6:
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i = 1; i <= kDeviceAllocationTypeMax; i++) {
if (device_type_bitset_.test(i)) {
// Skip all the unused values in the enum.
switch (i) {
case 0:
case 5:
case 6:
continue;
}
for (int i : {
DeviceAllocationType::kCPU,
DeviceAllocationType::kCUDA,
DeviceAllocationType::kCUDA_HOST,
DeviceAllocationType::kOPENCL,
DeviceAllocationType::kVULKAN,
DeviceAllocationType::kMETAL,
DeviceAllocationType::kVPI,
DeviceAllocationType::kROCM,
DeviceAllocationType::kROCM_HOST,
DeviceAllocationType::kEXT_DEV,
DeviceAllocationType::kCUDA_MANAGED,
DeviceAllocationType::kONEAPI,
DeviceAllocationType::kWEBGPU,
DeviceAllocationType::kHEXAGON,
}) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work when another enum entry is added. Mine will work and I put the kDeviceAllocationTypeMax right after the enum so people remember to not leave gaps.

cpp/src/arrow/chunked_array.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Aug 27, 2024
@felipecrv
Copy link
Contributor Author

felipecrv commented Aug 27, 2024

In general this looks good but I'm not convinced about the changes to InputType; I don't know of any kernels which take non-CPU data so it seems more reasonable to leave the InputType refactor for a follow up.

@bkietz I can leave it to a follow up, but that would force me to put ad-hoc device type checks in the kernel dispatching code instead of doing it in a systematic way.

On the other hand, Users may have implemented their own kernels which do operate on non CPU data in which case this change to InputType is breaking (since those kernels were not defined with the new constructor and default to gracefully rejecting non-CPU data). I'll have to think on this some more

These users were already on thin ice: all it takes is a GetNullCount() call in kernel dispatching logic to crash. The bright side is that I already give them a chance to avoid the check by passing the device types in the type matcher. This is why I didn't go for definitive "is cpu" checks and instead allowed extension of the set.

@felipecrv
Copy link
Contributor Author

@bkietz I extracted the basics from this PR into a new PR -> #43853

@apache apache deleted a comment from github-actions bot Aug 27, 2024
Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArrow bits LGTM!

DeviceAllocationTypeSet device_types() const;

/// \return true if all chunks are allocated on CPU-accessible memory.
bool is_cpu() const { return device_types().is_cpu_only(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth cacheing this after the first execution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might cache the entire set on the constructor. This check is very cheap, so I wouldn't cache in the C++ layer. The set is represented as a 64-bit word under the hood. It's very small.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 27, 2024
@felipecrv
Copy link
Contributor Author

@danepitkin after @bkietz I extracted some of the commits from here into a smaller PR that only adds the device_types() methods, doesn't constrain kernel execution, and doesn't touch the python code: #43853

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants