Simple implementation of the parallel data-partitioned batch SOM algorithm, see section 3.2 of Lawrence at al., 1995.
To generate the documentation simply run
doxygen
in the current directory.
Now the documentation is available at ./docs/html/index.html
In case you do not have doxygen install it, e.g. with apt, with
apt install graphviz doxygen
We recommend installing graphviz in order to have doxygen generate al sort of useful diagrams.
To build DIAPASOM you need
- a C++17 compiler (inline variables are used)
- cmake >= 3.0
- openmpi (optional, for mpi parallel support)
- openshmem (optional, for openshmem parallel support)
The package supports 4 different implementations:
- serial: implementation (no parallel computation)
- mpi: MPI implementation
- oshmem: openshmem implementation (supported by openmpi2)
- oshmem1s: openshmem one-sided, employs oshmem's capabilities for one-sided communication for IPC
All 4 implementations are enabled by default in CMakeLists.txt, Thus by running
cmake SOURCE_DIRECTORY
cmake --build .
you will compile all 4 implementations of the SOM algo algorithm, both the executables and the shared libraries:
- libdiapasom_serial
- libdiapasom_mpi
- libdiapasom_oshmem
- libdiapasom_oshmem1s
If case you want to customise the build process and compile, say, only the openshmem implementations (regular and one-sided), thus disabling the serial and MPI implementations, you can run:
cmake -D serial=OFF -D mpi=OFF SOURCE_DIRECTORY
cmake --build .
By default cmake will build the Debug versions of the libraries and the executables, you can alter this behaviour, say for the Release versions, by running:
cmake -DCMAKE_BUILD_TYPE=Release SOURCE_DIRECTORY
cmake --build .
By default testing is enabled and after a successful build you can test all the built implementations with ctest:
ctest --stop-on-failure
currently there are 80 tests for each implementation, i.e., 320 tests in total.
After cmake run you can use the different versions simply by typing, say for the MPI with 3 ranks version:
mpirun -np 3 diapasom.mpi dataset=SOURCE_DIRECTORY/tests/dataset1.txt
This will run the program with the default values of the parameters and print in the current directory the initial (epoch 0) state of the lattice (lattice0.out) and the state of the Lattice for the last epoch of the training (lattice101.out).
Important: dataset
is a mandatory argument.
Moreover the file referenced to by dataset must be accessible to all the ranks.
It is quite easy to modify the values of the parameters for the training process, you just need to pass them as CLI arguments. For example this how test "mpiBS0RS0" (batchsize = 0 and random seed = 0) is run:
mpirun -np 3 ./build/Debug/diapasom.mpi dataset=./tests/dataset1.txt latticedim=10 batchsize=0 epochs=20 outevery=10 rseed=0 epcall=./build/Debug/libepcall.so
As it can be seen you can provide a dynamic library for the "epcall" parameter containing an
"extern "C" void epcall(const som::Lattice*)" function that will be called at each epoch.
This is used by the tests suite
(tests/epcall.cpp)
to produce more output then the default one in order to aid
testing wether the results are correct and reproducible.
For a complete list of available parameters see the documentation for the som::TrainSettings class.
The development of this package was supported by the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955776.