Releases: eth-cscs/spla
Releases · eth-cscs/spla
SPLA 1.6.1
SPLA 1.6.0
SPLA 1.6.0 Release Notes
Changes
- Removed deprecated direct multi-threading control through
set_num_threads(...)
. Multi-threading must now be set through the linked BLAS library, for example with theOMP_NUM_THREADS
environment variable used by BLAS libraries compiled with OpenMP. - CMake option
SPLA_HOST_BLAS
was replaced byBLA_VENDOR
from the FindBlas CMake module. For linking with Libsci from Cray, setBLA_VENDOR
toCRAY_LIBSCI
. Other libraries not found by the default CMake module can be used by directly settingBLAS_LIBRARIES
to the required link command. - Switch to C++17
- CUDA version requirement increased to 11.0
- CMake version requirement increased to 3.18
- Updated dependencies required to build tests
- Added CMake option to disable downloading of test dependencies
Bug Fixes
- Fix compilation with rocBLAS 4.0.0 (ROCm 6.0)
SPLA 1.5.5
SPLA 1.5.5 Release Notes
Changes
- Controlling the number of threads through
set_num_threads(...)
has been deprecated and will be removed soon. Set the number of threads externally through the BLAS library instead.
Background:
Multi-threading is only used internally for parallelizing host GEMM calls. This is done through either setting the number of threads used by the blas library if possible or use OpenMP the parallelize blas calls if the blas library is thread-safe. This approach entails a lot of complexities and can cause issues due to differences of blas library behaviour. For example with OpenBLAS, setting the number of threads throughopenblas_set_num_threads(...)
to a value higher than the environment variableOMP_NUM_THREADS
will cause a deadlock inside OpenBLAS.
Bug Fixes
- Fixed missing C++ standard library header, which can cause a compilation error with GCC 13
SPLA 1.5.4
SPLA 1.5.4 Release Notes
Bug Fixes
- Fixed an issue, which could cause a dead lock, when the input memory locations (host or device memory) differed between MPI ranks while using the GPU for processing
SPLA 1.5.3
SPLA 1.5.3 Release Notes
Features
- Improved MKL detection with updated paths for oneMKL
Bug Fixes
- Fixed missing functions in Fortran interface
- Fixed MPI error, if
MPI_Finalize
has been called before context is destroyed
SPLA 1.5.2
SPLA 1.5.2 Release Notes
Features
- Support for Arm Performance Libraries
Bug Fixes
- Fixed a performance issue on AMD GPUs with latest versions of ROCm / HIP due to changes required for device pointer detection
SPLA 1.5.1
SPLA 1.5.1 Release Notes
Bug Fixes
- Fixed issues with installed CMake config files:
- Custom find modules are now found correctly
- Workaround for bug in find_dependency macro with CMake < 3.15.0, where components of MPI and OpenMP may be missing.
- When using a generic blas library, the missing BLAS::blas target is now available with CMake < 3.18.0
SPLA 1.5.0
SPLA 1.5.0 Release Notes
Features
- Functions for allocating host and device memory can now be passed to a context, allowing the use of external memory pools.
- Added internal caching of memory transfers to GPU, improving performance with host memory input
- Improved detection of BLAS libraries
- Overhaul of README and added examples
SPLA 1.4.0
SPLA 1.4.0 Release Notes
Features
- New function
pgemm_ssbtr
available, which allows computation of distributed triangular matrices - Implemented a ring communication scheme for
pgemm_sbs
, which should now match the performance ofpgemm_ssb
- Improved Fortran interface for better compatibility with CUDA Fortran
SPLA 1.3.0
SPLA 1.3.0 Release Notes
Features
- Implemented a ring communication scheme for
pgemm_ssb
, significantly improving performance in most cases - Fortran interface with optional compilation of the module