Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy up memmove/memcpy APIs #7737

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Tidy up memmove/memcpy APIs #7737

wants to merge 12 commits into from

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Jan 13, 2022

This cleans up the memcpy / memmove APIs and implementation.

  • There is now only one memcpy implementation, instead of one using the intrinsic and one using a loop - the implementation always uses the intrinsic now.
  • The align parameter is removed, which was probably a vestige from an earlier LLVM version and only served to confuse as it was not used.
  • The APIs permit using a form where either the length is supplied, or a count and item size are supplied - if the count and item size are supplied, and appropriate multiplication is generated.
  • Updated docstrings for these functions.
  • Replaced uses of raw_memcpy / raw_memmove with memcpy / memmove.

Fixes #7734.

@gmarkall gmarkall mentioned this pull request Jan 14, 2022
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

The gpuCI fail is unrelated to this PR and is only the result of an unexpected success due to the latest CUDA Python bindings release fixing an upstream issue (profiling APIs are now available) - I'll be making a separate PR to fix this. Otherwise, all tests pass so this is ready for review now.

@gmarkall gmarkall marked this pull request as ready for review January 14, 2022 10:06
@gmarkall gmarkall changed the title [WIP] Tidy up memmove/memcpy APIs Tidy up memmove/memcpy APIs Jan 14, 2022
@gmarkall gmarkall added 3 - Ready for Review Effort - medium Medium size effort needed and removed 2 - In Progress labels Jan 17, 2022
@gmarkall
Copy link
Member Author

gpuci run tests

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for undertaking this refactoring. The interface is a lot more consistent with the signatures and also with the functions themselves now! I've left a few small comments to address inline but otherwise looks good. Thanks again!

in_ptr = builder.gep(src, [loop.index])
builder.store(builder.load(in_ptr), out_ptr)
if dst.type != src.type:
msg = f'memcpy requires the same types; got {dst.type} and {src.type}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = f'memcpy requires the same types; got {dst.type} and {src.type}'
msg = f'memcpy requires the same types; got source type {src.type} and destination type {dst.type}'

would it be useful to specify which way around these are?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. I'll keep it destination-first though as that's the first argument to memcpy.

builder.store(builder.load(in_ptr), out_ptr)
if dst.type != src.type:
msg = f'memcpy requires the same types; got {dst.type} and {src.type}'
raise TypingError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typing problem? Think this should be a lowering error, the caller has passed in invalid llvm types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should this be tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I've looked into it, there's no requirement in LLVM for the types to be the same (https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic) - this was only needed by the old hand-rolled implementation. In this PR I added the check to enforce the description in the docstring, but I now see it's not necessary I'll remove it from this PR.

Comment on lines 1040 to 1041
Source and destination types can be any type, but the types must be the
same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this contain a similar check for source type matching dest type as present in the memcpy impl above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this implementation was OK (although the docstring is wrong, and will be updated).

Comment on lines 1058 to 1061
if isinstance(itemsize, ir.Constant) and itemsize.constant == 1:
length = count
else:
length = builder.mul(count, itemsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LLVM would just optimise a multiple by constant one away, but think it's fine to leave this as-is too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you mention it I do think it would be better to have simpler code here, so I'll make it just always generate the mul instead of trying to be clever.

else:
length = builder.mul(count, itemsize)

memcpy = builder.module.declare_intrinsic(func_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
memcpy = builder.module.declare_intrinsic(func_name,
func_impl = builder.module.declare_intrinsic(func_name,

(and refactor, this interface is intended to be generic for memcpy and memmove)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor to make this generic.

@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Feb 2, 2022
@gmarkall
Copy link
Member Author

gmarkall commented Mar 9, 2022

gpuci run tests

This issue occurred after the merge from main due to
logically-conflicting (but not conflicting in the sense of the merge)
changes in main and the issue-7734 branch.
@gmarkall
Copy link
Member Author

gmarkall commented Mar 9, 2022

gpuci run tests

1 similar comment
@gmarkall
Copy link
Member Author

gmarkall commented Mar 9, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Mar 9, 2022

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Mar 9, 2022

@stuartarchibald Many thanks for the review and comments. Everything is fixed up from your review, but as can be seen from gpuCI there's a new issue from things that were added to main since I made this PR that I've not yet resolved. Annoyingly it passes on my machines, but only fails on gpuCI so it's taking a little longer than I hoped to resolve this.

@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

OK, I got to the bottom of this issue. Factors involved in the problem / fix are:

  • This PR replaces all uses of memcpy in Numba with calls to the @llvm.memcpy intrinsic.
  • The @llvm.memcpy intrinsic seems to assume its arguments are aligned to the alignment of an i8, which varies across architectures. (The docs are quite vague about this, they do not specify the assumption when the alignment is not specified - https://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic)
  • We need to specify alignment of the source and destination parameters to prevent this assumption.
  • In order to specify alignment, we need to add the align parameter attribute to the source and destination parameters.
  • llvmlite does not support the align parameter attribute.
  • The form of memcpy and memcmp differs between LLVM 3.4 and 7.0, so for CUDA Toolkits less than 11.2, an additional transformation needs adding to the NVVM IR to translate the parameter alignment attributes into an additional parameter (it looks like no code has ever generated @llvm.memcpy in conjunction with the CUDA target before.

So, the next steps to get this finished are:

  • Add support for the align parameter attribute in llvmlite.
  • Modify the implementation of _raw_mem_intrinsic to add alignment parameter attributes to the source and destination parameters.
  • Add a transform to rewrite the @llvm.memcpy intrinsic as-needed for NVVM 3.4.

@sklam sklam modified the milestones: Numba 0.56 RC, PR Backlog Jun 1, 2022
@gmarkall
Copy link
Member Author

Rather than attempt to fix support for NVVM 3.4, instead I'll wait to revisit this when NVVM 3.4 support is dropped.

@gmarkall gmarkall added Blocked awaiting long term feature For PRs/Issues that require the implementation of a long term plan feature and removed 4 - Waiting on author Waiting for author to respond to review labels Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked awaiting long term feature For PRs/Issues that require the implementation of a long term plan feature Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memcpy_region looks suspiciously wrong
3 participants