Skip to content

Releases: eth-cscs/spla

SPLA 1.6.1

25 Jun 18:45
2bf5544
Compare
Choose a tag to compare

SPLA 1.6.1 Release Notes

Bug Fixes

  • Fixed compilation error due to changed pointer attribute name in HIP 6.0.0 and later

SPLA 1.6.0

07 Jun 12:08
261fdae
Compare
Choose a tag to compare

SPLA 1.6.0 Release Notes

Changes

  • Removed deprecated direct multi-threading control through set_num_threads(...) . Multi-threading must now be set through the linked BLAS library, for example with the OMP_NUM_THREADS environment variable used by BLAS libraries compiled with OpenMP.
  • CMake option SPLA_HOST_BLAS was replaced by BLA_VENDOR from the FindBlas CMake module. For linking with Libsci from Cray, set BLA_VENDOR to CRAY_LIBSCI. Other libraries not found by the default CMake module can be used by directly setting BLAS_LIBRARIES to the required link command.
  • Switch to C++17
    • CUDA version requirement increased to 11.0
    • CMake version requirement increased to 3.18
  • Updated dependencies required to build tests
  • Added CMake option to disable downloading of test dependencies

Bug Fixes

  • Fix compilation with rocBLAS 4.0.0 (ROCm 6.0)

SPLA 1.5.5

25 Apr 07:44
6fe85e4
Compare
Choose a tag to compare

SPLA 1.5.5 Release Notes

Changes

  • Controlling the number of threads through set_num_threads(...) has been deprecated and will be removed soon. Set the number of threads externally through the BLAS library instead.
    Background:
    Multi-threading is only used internally for parallelizing host GEMM calls. This is done through either setting the number of threads used by the blas library if possible or use OpenMP the parallelize blas calls if the blas library is thread-safe. This approach entails a lot of complexities and can cause issues due to differences of blas library behaviour. For example with OpenBLAS, setting the number of threads through openblas_set_num_threads(...) to a value higher than the environment variable OMP_NUM_THREADS will cause a deadlock inside OpenBLAS.

Bug Fixes

  • Fixed missing C++ standard library header, which can cause a compilation error with GCC 13

SPLA 1.5.4

15 Mar 08:46
f775126
Compare
Choose a tag to compare

SPLA 1.5.4 Release Notes

Bug Fixes

  • Fixed an issue, which could cause a dead lock, when the input memory locations (host or device memory) differed between MPI ranks while using the GPU for processing

SPLA 1.5.3

17 Feb 14:51
f50aad3
Compare
Choose a tag to compare

SPLA 1.5.3 Release Notes

Features

  • Improved MKL detection with updated paths for oneMKL

Bug Fixes

  • Fixed missing functions in Fortran interface
  • Fixed MPI error, if MPI_Finalize has been called before context is destroyed

SPLA 1.5.2

04 Nov 15:48
1232482
Compare
Choose a tag to compare

SPLA 1.5.2 Release Notes

Features

  • Support for Arm Performance Libraries

Bug Fixes

  • Fixed a performance issue on AMD GPUs with latest versions of ROCm / HIP due to changes required for device pointer detection

SPLA 1.5.1

02 Jul 07:00
8f2c8da
Compare
Choose a tag to compare

SPLA 1.5.1 Release Notes

Bug Fixes

  • Fixed issues with installed CMake config files:
    • Custom find modules are now found correctly
    • Workaround for bug in find_dependency macro with CMake < 3.15.0, where components of MPI and OpenMP may be missing.
    • When using a generic blas library, the missing BLAS::blas target is now available with CMake < 3.18.0

SPLA 1.5.0

17 Jun 07:22
f3a7b56
Compare
Choose a tag to compare

SPLA 1.5.0 Release Notes

Features

  • Functions for allocating host and device memory can now be passed to a context, allowing the use of external memory pools.
  • Added internal caching of memory transfers to GPU, improving performance with host memory input
  • Improved detection of BLAS libraries
  • Overhaul of README and added examples

SPLA 1.4.0

14 Apr 12:30
730d472
Compare
Choose a tag to compare

SPLA 1.4.0 Release Notes

Features

  • New function pgemm_ssbtr available, which allows computation of distributed triangular matrices
  • Implemented a ring communication scheme for pgemm_sbs, which should now match the performance of pgemm_ssb
  • Improved Fortran interface for better compatibility with CUDA Fortran

SPLA 1.3.0

01 Mar 09:51
8c1e050
Compare
Choose a tag to compare

SPLA 1.3.0 Release Notes

Features

  • Implemented a ring communication scheme for pgemm_ssb, significantly improving performance in most cases
  • Fortran interface with optional compilation of the module