-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bye-bye thrust #856
Bye-bye thrust #856
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #856 +/- ##
==========================================
- Coverage 91.84% 90.04% -1.80%
==========================================
Files 37 65 +28
Lines 4976 9069 +4093
Branches 0 946 +946
==========================================
+ Hits 4570 8166 +3596
- Misses 406 903 +497 ☔ View full report in Codecov by Sentry. |
8311e6c
to
6052989
Compare
OK thrust is now gone in this branch. We have some performance penalty, but can probably fix that via some tuning? idk |
I always find it annoying how Git sees one PR built on top of another PR as a giant pile of merge conflicts... You'd think they could resolve this automatically when the top commits have been merged to master. |
fixing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Now I just need to make it compile in parallel on my M1 - I'll give that a try now.
autoPolicy(tmpBuffer.size()), vertIdIter, vertIdIter + tmpBuffer.size(), | ||
tmpBuffer.begin(), | ||
[](size_t v) { return v != std::numeric_limits<size_t>::max(); }); | ||
gather(autoPolicy(tmpBuffer.size()), tmpBuffer.begin(), next, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we now need a temporary buffer and an extra gather operation? I noticed the same pattern change up in boolean_result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need this for copy_if
with stencil? Not sure how to do it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, yeah I missed that. Fair enough, I can't think of a better way either.
src/utilities/include/par.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
And just so we know where to look - where have you noticed a performance penalty? |
I need some more time to do rigorous profiling. I think I just noticed the unit test time increased by a bit, not much though. |
After running the tests again, it seems that we are roughly 4% faster, not slower, regardless of whether |
and we probably want to add the cmake config needed to check pstl version? for #787. |
Okay, I've got OneDPL linked in now and I'm getting the same compiler problem on the CI as my local Mac, so that should aid with debug a bit. I have no idea what this means:
One the plus side, it is actually trying to use the PSTL from DPL, I just don't know what it doesn't like. |
I also don't understand all of the known limitations, or if we might be hitting any of them: https://oneapi-src.github.io/oneDPL/introduction.html#difference-with-standard-c-parallel-algorithms |
are they just reduce and transform_reduce? can you get it working by changing them to sequential for the time being? just to see if oneDPL works. |
Good call, it was only |
Okay, now the Mac CI is just failing the cmake consumer test. What's the proper way to handle the OneDPL dependency? Do we link it in |
This is the closest I've come to useful info about the |
Feels like so. Maybe you can try to install oneDPL from source with the latest version and see if it works? I think we should probably file an upsteam issue if this persists. Also, I wonder if we need PSTL at all. Most functions here are rather simple to implement with tbb |
and doing that will fix #787 as well. but probably a bit too much for today. |
@elalish things should work now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is excellent; let's discuss the PSTL / parallel reduce stuff in a follow-up issue.
if(oneDPL_FOUND) | ||
target_link_libraries(${PROJECT_NAME} INTERFACE oneDPL) | ||
else() | ||
message(WARNING "oneDPL not found, sequential implementation is used instead") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I see the appeal, but in the name of maintainability, PSTL feels safer long term. Maybe something better than TBB comes along, like TBB overtook OpenMP? Maybe new architectures come along that need a new parallel driver (e.g. Google TPUs)? I feel like PSTL will have the best chance of supporting all these unknown futures with minimal effort on our part. But that said, I'm open to being convinced otherwise. |
Fixes #809
Basically #823 but based on #852.