Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bye-bye thrust #856

Merged
merged 13 commits into from
Jul 8, 2024
Merged

Bye-bye thrust #856

merged 13 commits into from
Jul 8, 2024

Conversation

pca006132
Copy link
Collaborator

@pca006132 pca006132 commented Jul 7, 2024

Fixes #809

Basically #823 but based on #852.

Copy link

codecov bot commented Jul 7, 2024

Codecov Report

Attention: Patch coverage is 79.00000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 90.04%. Comparing base (d437097) to head (036091a).
Report is 57 commits behind head on master.

Files Patch % Lines
src/manifold/src/properties.cpp 66.66% 6 Missing ⚠️
src/manifold/src/boolean_result.cpp 80.00% 4 Missing ⚠️
src/manifold/src/sort.cpp 20.00% 4 Missing ⚠️
src/utilities/include/iters.h 25.00% 3 Missing ⚠️
src/manifold/src/impl.cpp 83.33% 2 Missing ⚠️
src/utilities/include/par.h 96.29% 1 Missing ⚠️
src/utilities/include/utils.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #856      +/-   ##
==========================================
- Coverage   91.84%   90.04%   -1.80%     
==========================================
  Files          37       65      +28     
  Lines        4976     9069    +4093     
  Branches        0      946     +946     
==========================================
+ Hits         4570     8166    +3596     
- Misses        406      903     +497     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pca006132 pca006132 force-pushed the bye-thrust branch 2 times, most recently from 8311e6c to 6052989 Compare July 7, 2024 14:25
@pca006132 pca006132 mentioned this pull request Jul 7, 2024
@pca006132
Copy link
Collaborator Author

OK thrust is now gone in this branch. We have some performance penalty, but can probably fix that via some tuning? idk

@elalish
Copy link
Owner

elalish commented Jul 7, 2024

I always find it annoying how Git sees one PR built on top of another PR as a giant pile of merge conflicts... You'd think they could resolve this automatically when the top commits have been merged to master.

@pca006132
Copy link
Collaborator Author

fixing now

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Now I just need to make it compile in parallel on my M1 - I'll give that a try now.

autoPolicy(tmpBuffer.size()), vertIdIter, vertIdIter + tmpBuffer.size(),
tmpBuffer.begin(),
[](size_t v) { return v != std::numeric_limits<size_t>::max(); });
gather(autoPolicy(tmpBuffer.size()), tmpBuffer.begin(), next,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we now need a temporary buffer and an extra gather operation? I noticed the same pattern change up in boolean_result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this for copy_if with stencil? Not sure how to do it otherwise.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, yeah I missed that. Fair enough, I can't think of a better way either.

src/manifold/src/properties.cpp Outdated Show resolved Hide resolved
src/manifold/src/properties.cpp Outdated Show resolved Hide resolved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@elalish
Copy link
Owner

elalish commented Jul 7, 2024

And just so we know where to look - where have you noticed a performance penalty?

@pca006132
Copy link
Collaborator Author

I need some more time to do rigorous profiling. I think I just noticed the unit test time increased by a bit, not much though.

@pca006132
Copy link
Collaborator Author

After running the tests again, it seems that we are roughly 4% faster, not slower, regardless of whether MANIFOLD_DEBUG is on or off. Maybe my feeling that this is slower is due to some random thermo thing for my computer.

@pca006132
Copy link
Collaborator Author

and we probably want to add the cmake config needed to check pstl version? for #787.

@elalish
Copy link
Owner

elalish commented Jul 7, 2024

Okay, I've got OneDPL linked in now and I'm getting the same compiler problem on the CI as my local Mac, so that should aid with debug a bit. I have no idea what this means:

/opt/homebrew/include/oneapi/dpl/internal/../pstl/glue_numeric_impl.h:43:12: error: call to function 'transform_reduce' that is neither visible in the template definition nor found by argument-dependent lookup
    return transform_reduce(::std::forward<_ExecutionPolicy>(__exec), __first, __last, __init, __binary_op,
           ^
/Users/runner/work/manifold/manifold/src/utilities/include/par.h:132:21: note: in instantiation of function template specialization 'oneapi::dpl::reduce<const oneapi::dpl::execution::parallel_unsequenced_policy &, glm::vec<3, float> *, glm::vec<3, float>, (lambda at /Users/runner/work/manifold/manifold/src/manifold/src/properties.cpp:329:58)>' requested here
STL_DYNAMIC_BACKEND(reduce, void)
                    ^
/Users/runner/work/manifold/manifold/src/manifold/src/properties.cpp:327:15: note: in instantiation of function template specialization 'manifold::reduce<glm::vec<3, float>, glm::vec<3, float> *, glm::vec<3, float> *, glm::vec<3, float>, (lambda at /Users/runner/work/manifold/manifold/src/manifold/src/properties.cpp:329:58)>' requested here
  bBox_.min = reduce<glm::vec3>(
              ^
/opt/homebrew/include/oneapi/dpl/pstl/glue_numeric_defs.h:60:1: note: 'transform_reduce' should be declared prior to the call site or in an associated namespace of one of its arguments
transform_reduce(_ExecutionPolicy&& __exec, _ForwardIterator __first, _ForwardIterator __last, _Tp __init,
^

One the plus side, it is actually trying to use the PSTL from DPL, I just don't know what it doesn't like.

@elalish
Copy link
Owner

elalish commented Jul 7, 2024

I also don't understand all of the known limitations, or if we might be hitting any of them: https://oneapi-src.github.io/oneDPL/introduction.html#difference-with-standard-c-parallel-algorithms

@pca006132
Copy link
Collaborator Author

are they just reduce and transform_reduce? can you get it working by changing them to sequential for the time being? just to see if oneDPL works.

@elalish
Copy link
Owner

elalish commented Jul 8, 2024

Good call, it was only reduce and transform_reduce - it's now running faster on my M1 than before this PR (~15% faster, not bad!). I suppose parallel reduce isn't too critical, but I'd love to understand the problem.

@elalish
Copy link
Owner

elalish commented Jul 8, 2024

Okay, now the Mac CI is just failing the cmake consumer test. What's the proper way to handle the OneDPL dependency? Do we link it in test-cmake.sh, or change something about the CMake install?

@elalish
Copy link
Owner

elalish commented Jul 8, 2024

This is the closest I've come to useful info about the reduce error. It almost seems like it's a OneDPL bug? Unless there's something about our lambdas that are confusing it.

@pca006132
Copy link
Collaborator Author

This is the closest I've come to useful info about the reduce error. It almost seems like it's a OneDPL bug? Unless there's something about our lambdas that are confusing it.

Feels like so. Maybe you can try to install oneDPL from source with the latest version and see if it works? I think we should probably file an upsteam issue if this persists.

Also, I wonder if we need PSTL at all. Most functions here are rather simple to implement with tbb parallel_for and parallel_reduce, the only exception is probably stable_sort?

@pca006132
Copy link
Collaborator Author

pca006132 commented Jul 8, 2024

and doing that will fix #787 as well. but probably a bit too much for today.

@pca006132 pca006132 mentioned this pull request Jul 8, 2024
@pca006132
Copy link
Collaborator Author

@elalish things should work now.

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is excellent; let's discuss the PSTL / parallel reduce stuff in a follow-up issue.

if(oneDPL_FOUND)
target_link_libraries(${PROJECT_NAME} INTERFACE oneDPL)
else()
message(WARNING "oneDPL not found, sequential implementation is used instead")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@elalish
Copy link
Owner

elalish commented Jul 8, 2024

Also, I wonder if we need PSTL at all. Most functions here are rather simple to implement with tbb parallel_for and parallel_reduce, the only exception is probably stable_sort?

I see the appeal, but in the name of maintainability, PSTL feels safer long term. Maybe something better than TBB comes along, like TBB overtook OpenMP? Maybe new architectures come along that need a new parallel driver (e.g. Google TPUs)? I feel like PSTL will have the best chance of supporting all these unknown futures with minimal effort on our part. But that said, I'm open to being convinced otherwise.

@elalish elalish merged commit e46938b into elalish:master Jul 8, 2024
21 checks passed
@pca006132 pca006132 deleted the bye-thrust branch July 8, 2024 16:29
@elalish elalish mentioned this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Thrust
2 participants