Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FindHIP and use hip-config.cmake instead #526

Merged
merged 19 commits into from
Jan 11, 2022

Conversation

davidbeckingsale
Copy link
Member

No description provided.

@trws
Copy link
Member

trws commented Sep 22, 2021

This seems like a substantial improvement. A breaking change, but a substantial improvement nevertheless.

@davidbeckingsale
Copy link
Member Author

Yup, two major changes:

  • no hip-> nvcc support.
  • new target names blt::hip and blt::hip_runtime

I don't think people were using hip->nvcc anyway, and we can revisit that with the builtin CMake HIP support.
For the new target names, it's not ideal but there is already a hip target created by the hip-config so I was unable to use that name. The hip_runtime target could remain but I renamed it for consistency.

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a review (yet), but please update the RELEASE-NOTES with this change, especially since it seems like a breaking change.

Some other questions:

  • does this require a specific version of CMake ?
  • will this PR necessitate a new BLT release?

@trws
Copy link
Member

trws commented Sep 24, 2021

It probably requires a specific version of rocm, but at least from my testing 3.14+ works just fine in cmake. I don't tend to keep any lower around so I can't speak to that.

@davidbeckingsale
Copy link
Member Author

Tom's right, just a specific HIP/rocm version, but I have checked back a long way and the file was there, so I think we should be good to go.

The change is only breaking due to target names so if there was a way to avoid that I think it would be preferable. Happy to take suggestions. If not, I think a release would be appropriate to make it easy to do things like Spack packages (things would have to be conditional based on BLT version).

@trws
Copy link
Member

trws commented Sep 25, 2021 via email

@davidbeckingsale
Copy link
Member Author

Hey folks I would be interested in getting this merged. Kenny mentioned updating release notes, but I think the fact that this requires namespacing the targets requires some more discussion. I'm happy to add the namespaces (and alias) to the other imported BLT targets for consistency, and we can deprecate the non-namespaced versions.

Thoughts?

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like a really nice cleanup of the hip setup and logic.

  • I like the blt::hip namespace change
  • For consistency, we should probably do the same to our other similar targets, e.g. nvcc, mpi, openmp, ...
    • I agree that we should think about/discuss these changes
  • Since this is a breaking change, we should also consider creating a new release.
  • In the meantime, it should definitely be documented in our RELEASE-NOTES

Since I have less familiarity with some of the implications, I'll wait on others to approve this before me.

Edit: Clarifying my second bullet -- if we are going to require people to update their build systems to accommodate the blt::hip namespace change, and are thinking about doing similar things for other dependencies, like nvcc, it would probably be a good idea to do it all at once.

@@ -59,9 +59,6 @@ mark_as_advanced(ENABLE_CLANG_CUDA)
set(BLT_CLANG_CUDA_ARCH "sm_30" CACHE STRING "Compute architecture to use when generating CUDA code with Clang")
mark_as_advanced(BLT_CLANG_CUDA_ARCH)
option(ENABLE_HIP "Enable HIP support" OFF)
cmake_dependent_option(ENABLE_CLANG_HIP "Enable Clang's native HIP support" OFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please comment on why ENABLE_CLANG_HIP is no longer necessary?

Bonus: It seems that we still need ENABLE_CLANG_CUDA in some cuda cases. Please also comment on that.
It would be nice if we could document some of this, perhaps in a follow up PR.

@kennyweiss
Copy link
Member

kennyweiss commented Oct 29, 2021

I tried testing this branch locally on rznevada using cmake@3.19.2
and got some compilation warnings and errors (in addition to the missing blt::hip_runtime that I commented on above):

Config and build lines

> ml cmake/3.19.2 
> mkdir build-debug && cd build-debug
> cmake -C ../host-configs/llnl/toss_4_x86_64_ib_cray/cce@12.0.0_clang_hip.cmake ../tests/internal/
> make

Warning (seems safe to ignore)

<blt>/tests/internal/src/static_analysis/subtle_error_source.cpp:23:23: warning: array index 1 is past the end of the array (which contains 1 element) [-Warray-bounds]
  int first_element = array[1];
                      ^     ~
<blt>/tests/internal/src/static_analysis/subtle_error_source.cpp:22:3: note: array 'array' declared here
  int array[1] = {5};
  ^

Linker error

[ 59%] Linking CXX executable ../../../tests/blt_hip_smoke
/opt/cray/pe/cce/12.0.0/cce/x86_64/bin/cce_omp_offload_linker: Unsupported hip accelerator target: gfx900
clang-12: error: linker command failed with exit code 255 (use -v to see invocation)
make[2]: *** [blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/build.make:104: tests/blt_hip_smoke] Error 255
make[1]: *** [CMakeFiles/Makefile2:1978: blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/all] Error 2

Edit: Are you seeing these, or am I doing something wrong?

Update: Looks like the warning is related to a static_analysis test and is expected, so we can ignore that one.

Update 2: I'm seeing the same error with cmake@3.21.1.
Even though the host config is explicitly specifying set(CMAKE_HIP_ARCHITECTURES "gfx908" CACHE STRING "")
Something is injecting lots of -offload-arch flags:
--offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908

Here's the full build and link line from cmake/make

[ 31%] Building CXX object blt/tests/smoke/CMakeFiles/blt_hip_gtest_smoke.dir/blt_hip_gtest_smoke.cpp.o
cd <blt>/build-debug/blt/tests/smoke && /usr/tce/packages/cce-tce/cce-12.0.0/bin/crayCC
	-DGTEST_HAS_DEATH_TEST=0
	-D__HIP_PLATFORM_AMD__=1
	-D__HIP_PLATFORM_HCC__=1
	-I<blt>/thirdparty_builtin/googletest-master-2020-01-07/googletest
	-isystem <blt>/thirdparty_builtin/googletest-master-2020-01-07/googletest/include
	-isystem /opt/rocm-4.2.0/hip/../include
	-isystem /opt/rocm-4.2.0/llvm/lib/clang/12.0.0/include/..
	-isystem /opt/rocm-4.2.0/hip/include
	-isystem /opt/rocm-4.2.0/include
	-Wall
	-Wextra     
	-fPIE
	--rocm-path=/opt/rocm-4.2.0
	-xhip
	--offload-arch=gfx900
	--offload-arch=gfx906
	--offload-arch=gfx908
	-std=c++11
	-MD
	-MT blt/tests/smoke/CMakeFiles/blt_hip_gtest_smoke.dir/blt_hip_gtest_smoke.cpp.o
	-MF CMakeFiles/blt_hip_gtest_smoke.dir/blt_hip_gtest_smoke.cpp.o.d
	-o CMakeFiles/blt_hip_gtest_smoke.dir/blt_hip_gtest_smoke.cpp.o
	-c <blt>/tests/smoke/blt_hip_gtest_smoke.cpp

[ 33%] Linking CXX executable ../../../tests/blt_hip_gtest_smoke
cd <blt>/build-debug/blt/tests/smoke && /usr/tce/backend/installations/linux-rhel8-x86_64/gcc-10.2.1/cmake-3.21.1-r36vais2fcnwnbsd63glpwypg55bts7y/bin/cmake
	-E cmake_link_script CMakeFiles/blt_hip_gtest_smoke.dir/link.txt
	--verbose=1
/usr/tce/packages/cce-tce/cce-12.0.0/bin/crayCC 
	-Wall
	-Wextra      CMakeFiles/blt_hip_gtest_smoke.dir/blt_hip_gtest_smoke.cpp.o
	-o ../../../tests/blt_hip_gtest_smoke 
	-Wl,-rpath,/opt/rocm-4.2.0/hip/lib:/opt/rocm-4.2.0/lib ../../../lib/libgtest_main.a ../../../lib/libgtest.a /opt/rocm-4.2.0/hip/lib/libamdhip64.so.4.2.40200
	--hip-link
	--offload-arch=gfx900
	--offload-arch=gfx906
	--offload-arch=gfx908
	-L"/opt/rocm-4.2.0/llvm/lib/clang/12.0.0/include/../lib/linux"
	-lclang_rt.builtins-x86_64
	-Wl,-rpath-link,/opt/rocm-4.2.0/lib 

/opt/cray/pe/cce/12.0.0/cce/x86_64/bin/cce_omp_offload_linker: Unsupported hip accelerator target: gfx900

clang-12: error: linker command failed with exit code 255 (use
	-v to see invocation)
make[2]: *** [blt/tests/smoke/CMakeFiles/blt_hip_gtest_smoke.dir/build.make:100: tests/blt_hip_gtest_smoke] Error 255


set(ROCM_PATH "/opt/rocm-4.1.0/" CACHE PATH "")
set(HIP_PLATFORM "clang" CACHE STRIING "")

set(BLT_CLANG_HIP_ARCH "gfx908" CACHE STRING "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLT_CLANG_HIP_ARCH got removed in #521
I think it should be:

Suggested change
set(BLT_CLANG_HIP_ARCH "gfx908" CACHE STRING "")
set(CMAKE_HIP_ARCHITECTURES "gfx908" CACHE STRING "")

@kennyweiss
Copy link
Member

@davidbeckingsale -- I discovered that our current hip host-config was also broken and created #537 to fix it for cmake@3.19. It's still not working correctly for cmake@3.21.

See also #536

@davidbeckingsale
Copy link
Member Author

Thanks @kennyweiss for taking a look at this. I'll make the changes you have suggested and will investigate the build issue - I have a hunch as to where it might be coming from.

@DavidPoliakoff
Copy link
Contributor

Just FYI, this fixes a really obnoxious bug for us in "Kokkos + Perfsuite interoperability" land. So if it hits credibility on your side, we'd love to see a merge (and ideally then a merge of that BLT into RAJA, and that RAJA into Perfsuite, if you're feeling charitable :) )

@rhornung67
Copy link
Member

rhornung67 commented Nov 9, 2021

Just FYI, this fixes a really obnoxious bug for us in "Kokkos + Perfsuite interoperability" land. So if it hits credibility on your side, we'd love to see a merge (and ideally then a merge of that BLT into RAJA, and that RAJA into Perfsuite, if you're feeling charitable :) )

@DavidPoliakoff that's the plan. a few folks are out on vacation until the middle of next week. we'll work on a perf suite release after that. Actually, it looks like the next BLT release will not come out until after the new year. So it will be a while before this gets pulled into RAJA repos.

DEFINES ${_hip_compile_defines}
COMPILE_FLAGS ${_hip_compile_flags})
if (NOT ROCM_PATH)
find_path(ROCM_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

@bmhan12
Copy link
Contributor

bmhan12 commented Nov 11, 2021

I'm encountering the following errors during configuration when using this branch in the blt submodule for axom:

CMake Error at blt/cmake/BLTMacros.cmake:559 (add_library):
  The target name "blt::hip" is reserved or not valid for certain CMake
  features, such as generator expressions, and may result in undefined
  behavior.
Call Stack (most recent call first):
  blt/cmake/thirdparty/SetupHIP.cmake:27 (blt_import_library)
  blt/cmake/thirdparty/SetupThirdParty.cmake:56 (include)
  blt/SetupBLT.cmake:112 (include)
  CMakeLists.txt:167 (include)


CMake Error at blt/cmake/BLTMacros.cmake:437 (message):
  blt_patch_target() NAME argument must be a native CMake target
Call Stack (most recent call first):
  blt/cmake/BLTMacros.cmake:566 (blt_patch_target)
  blt/cmake/thirdparty/SetupHIP.cmake:27 (blt_import_library)
  blt/cmake/thirdparty/SetupThirdParty.cmake:56 (include)
  blt/SetupBLT.cmake:112 (include)
  CMakeLists.txt:167 (include)

@trws
Copy link
Member

trws commented Nov 16, 2021

That error is peculiar, what version of cmake are you using?

@davidbeckingsale
Copy link
Member Author

That error is peculiar, what version of cmake are you using?

Lots of us have hit this - a target that is not imported cannot have :: in the name https://cmake.org/cmake/help/latest/policy/CMP0028.html

BLT provides an option to export the tpl targets (meaning they can't be created as imported libraries) - this error appears when the option BLT_EXPORT_THIRDPARTY is On.

@trws
Copy link
Member

trws commented Nov 16, 2021

Imported or ALIAS, can we name the target __blt_dont_use_this_one_seriously_i_mean_it_hip and make an alias to it that's blt::hip?

@davidbeckingsale
Copy link
Member Author

Imported or ALIAS, can we name the target __blt_dont_use_this_one_seriously_i_mean_it_hip and make an alias to it that's blt::hip?

Great idea! I think that should work.

@cyrush
Copy link
Member

cyrush commented Dec 9, 2021

For the export targets issue:

When used downstream, aren't projects exporting blt's targets into their own namespaces anyway?
Like:

https://github.com/LLNL/Umpire/blob/e118325fa68bda4a1cb5510e825e6e8680525353/cmake/SetupUmpireThirdParty.cmake#L72

If we called these blt_zzz that would be an improvement and still compatible with this strategy?

They would be namespaces, but protected in the scope of the imported project.

@white238
Copy link
Member

white238 commented Jan 10, 2022

This is good to merge pending someone else's review and my couple changes.

Also please update the release notes and this page docs/tutorial/common_hpc_dependencies.rst. You don't have to go all out in documenting the latter but please update the target names.

Thanks @davidbeckingsale !

@davidbeckingsale
Copy link
Member Author

Also please update the release notes and this page docs/tutorial/common_hpc_dependencies.rst. You don't have to go all out in documenting the latter but please update the target names.

Good catch, will do.

cmake/thirdparty/SetupHIP.cmake Outdated Show resolved Hide resolved
cmake/thirdparty/SetupHIP.cmake Show resolved Hide resolved
Co-authored-by: Chris White <white238@llnl.gov>
Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through this @davidbeckingsale !

.. note:: The recommended usage of the HIP targets is via the ``blt::hip`` and
``blt::hip_runtime`` aliases. Alias targets cannot be exported, so the
``blt_hip``/``blt_hip_runtime`` names can be used for this purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention anything about the underlying targets, and if those are still available for users?
(I think those are hip::host and hip::device, but I'm not sure)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are available, but hip::host actually seems broken (missing include) so I don't think we should recommend it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior would also be different would it not?

I admit I'm wondering, especially with this doc change, if having the aliases is worth it since now there's a split between at least two uses. Should it just always be blt_<backend> perhaps rather than have both? At least that way it can be made consistent everywhere. Really not fond of how aliases only kinda-sorta work across projects like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭

Installing a target provided by BLT is a pretty specific use case that afaik only a couple of projects are doing. Using blt_<backend> is fine but just looks wrong to me, I feel like the namespaced target is correct since we want people to think of them as imported targets.

Co-authored-by: Chris White <white238@llnl.gov>
@davidbeckingsale davidbeckingsale merged commit 667158f into develop Jan 11, 2022
@white238 white238 deleted the feature/no-findhip branch January 11, 2022 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants