-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidy up memmove/memcpy APIs #7737
base: main
Are you sure you want to change the base?
Conversation
gpuci run tests |
The gpuCI fail is unrelated to this PR and is only the result of an unexpected success due to the latest CUDA Python bindings release fixing an upstream issue (profiling APIs are now available) - I'll be making a separate PR to fix this. Otherwise, all tests pass so this is ready for review now. |
gpuci run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for undertaking this refactoring. The interface is a lot more consistent with the signatures and also with the functions themselves now! I've left a few small comments to address inline but otherwise looks good. Thanks again!
numba/core/cgutils.py
Outdated
in_ptr = builder.gep(src, [loop.index]) | ||
builder.store(builder.load(in_ptr), out_ptr) | ||
if dst.type != src.type: | ||
msg = f'memcpy requires the same types; got {dst.type} and {src.type}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msg = f'memcpy requires the same types; got {dst.type} and {src.type}' | |
msg = f'memcpy requires the same types; got source type {src.type} and destination type {dst.type}' |
would it be useful to specify which way around these are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. I'll keep it destination-first though as that's the first argument to memcpy
.
numba/core/cgutils.py
Outdated
builder.store(builder.load(in_ptr), out_ptr) | ||
if dst.type != src.type: | ||
msg = f'memcpy requires the same types; got {dst.type} and {src.type}' | ||
raise TypingError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typing problem? Think this should be a lowering error, the caller has passed in invalid llvm types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I've looked into it, there's no requirement in LLVM for the types to be the same (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic) - this was only needed by the old hand-rolled implementation. In this PR I added the check to enforce the description in the docstring, but I now see it's not necessary I'll remove it from this PR.
numba/core/cgutils.py
Outdated
Source and destination types can be any type, but the types must be the | ||
same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this contain a similar check for source type matching dest type as present in the memcpy
impl above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this implementation was OK (although the docstring is wrong, and will be updated).
numba/core/cgutils.py
Outdated
if isinstance(itemsize, ir.Constant) and itemsize.constant == 1: | ||
length = count | ||
else: | ||
length = builder.mul(count, itemsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LLVM would just optimise a multiple by constant one away, but think it's fine to leave this as-is too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you mention it I do think it would be better to have simpler code here, so I'll make it just always generate the mul instead of trying to be clever.
numba/core/cgutils.py
Outdated
else: | ||
length = builder.mul(count, itemsize) | ||
|
||
memcpy = builder.module.declare_intrinsic(func_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcpy = builder.module.declare_intrinsic(func_name, | |
func_impl = builder.module.declare_intrinsic(func_name, |
(and refactor, this interface is intended to be generic for memcpy
and memmove
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will refactor to make this generic.
gpuci run tests |
This issue occurred after the merge from main due to logically-conflicting (but not conflicting in the sense of the merge) changes in main and the issue-7734 branch.
gpuci run tests |
1 similar comment
gpuci run tests |
gpuci run tests |
@stuartarchibald Many thanks for the review and comments. Everything is fixed up from your review, but as can be seen from gpuCI there's a new issue from things that were added to main since I made this PR that I've not yet resolved. Annoyingly it passes on my machines, but only fails on gpuCI so it's taking a little longer than I hoped to resolve this. |
gpuci run tests |
OK, I got to the bottom of this issue. Factors involved in the problem / fix are:
So, the next steps to get this finished are:
|
Rather than attempt to fix support for NVVM 3.4, instead I'll wait to revisit this when NVVM 3.4 support is dropped. |
This cleans up the
memcpy
/memmove
APIs and implementation.memcpy
implementation, instead of one using the intrinsic and one using a loop - the implementation always uses the intrinsic now.align
parameter is removed, which was probably a vestige from an earlier LLVM version and only served to confuse as it was not used.raw_memcpy
/raw_memmove
withmemcpy
/memmove
.Fixes #7734.