-
Notifications
You must be signed in to change notification settings - Fork 36
/
Copy pathREADME.amdahl
49 lines (30 loc) · 2.07 KB
/
README.amdahl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
For large fits across mutliple nodes you may find that the proposal step is a bottleneck.
The compiled DE stepper can speed this up by a factor of 2 compared to the numba
version that is usually used. This might help on large allocations, but not enough
to support it in the automatic build infrastructure.
Update: MSVC is 6x faster than numba on one machine. Need to check performance with and
without compiled on HPC hardware to know if the compiled version is required.
To use the compiled de stepper and bounds checks, first make sure the "random123" library submodule has been checked out
git clone --branch v1.14.0 https://github.com/DEShawResearch/random123.git bumps/dream/random123
Then, to compile on unix use:
(cd bumps/dream && cc compiled.c -I ./random123/include/ -O2 -fopenmp -shared -lm -o _compiled.so -fPIC -DMAX_THREADS=64)
On OS/X clang doesn't support OpenMP:
(cd bumps/dream && cc compiled.c -I ./random123/include/ -O2 -shared -lm -o _compiled.so -fPIC -DMAX_THREADS=64)
MSVC on windows using Visual Studio build tools (2022):
% set up compiler environment
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64
cd bumps\dream
cl compiled.c -I .\random123\include /O2 /openmp /LD /GL /Fe_compiled.so
This only works when _compiled.so is in the bumps/dream directory. If running
from a pip installed version, you will need to fetch the bumps repository:
$ git clone https://github.com/bumps/bumps.git
$ cd bumps
Compile as above, then find the bumps install path using the following:
$ python -c "import bumps.dream; print(bumps.dream.__file__)"
#dream/path/__init__.py
Copy the compiled module to the install, with the #dream/path printed above:
$ cp bumps/dream/_compiled.so #dream/path
There is no provision for using _compiled.so in a frozen application.
Run with no more than 64 OMP threads. If the number of processors is more than 64, then use:
OMP_NUM_THREADS=64 ./run.py ...
I don't know how OMP_NUM_THREADS behaves if it is larger than the number of processors.