You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to disambiguate two potential meanings of "parallel writing" with MPI. Here is an example scenario. Can you let me know which behavior to expect from h5py + MPI?
Scenario: A pre-existing h5 file with 50 datasets exist, each having space preallocated. There are 50 different threads running, and in the exact same instant, each attempts to write a numpy array to one of the 50 datasets.
In this scenario, which of the following would occur?
All 50 writes happen simultaneously, resulting in an approximately 50x speedup for writing the data to the file
Even though the 50 writes are initiated by distinct processes, they still happen sequentially in time, resulting in no appreciable speedup for writing the data to the file
The reason I ask is that I have an ingestion script that uses h5py+MPI to read in a number of datasets into a single H5 file using h5py+MPI. Running with 1 process or 75 processes produces identical output. However, the 75-process version is only about 3x faster than the 1-process version, and the single line on which I write the in-memory numpy array to the precreated and presized dataset in the single output H5 file takes virtually the entire runtime.
This makes me suspect that under the hood, even though h5py+MPI can collect "requests" to write to datasets from multiple independent processes in parallel, each individual write to the h5 file still has to happen sequentially in reality. So if N parallel processes are simultaneously trying to do their writes, they won't actually realize an Nx speedup if the bottleneck is actually writing to the H5 file.
Can you let me know if I'm understanding h5py+MPI's parallel write correctly?
Thank you.
The text was updated successfully, but these errors were encountered:
I believe that with MPI you can genuinely do the writes in parallel in the sense that HDF5 doesn't need to serialise the operations of different processes. But of course there are still limitations in the operating system and the hardware, so it's unlikely that using 75 processes is ever going to be 75x faster.
I would like to disambiguate two potential meanings of "parallel writing" with MPI. Here is an example scenario. Can you let me know which behavior to expect from h5py + MPI?
Scenario: A pre-existing h5 file with 50 datasets exist, each having space preallocated. There are 50 different threads running, and in the exact same instant, each attempts to write a numpy array to one of the 50 datasets.
In this scenario, which of the following would occur?
The reason I ask is that I have an ingestion script that uses h5py+MPI to read in a number of datasets into a single H5 file using h5py+MPI. Running with 1 process or 75 processes produces identical output. However, the 75-process version is only about 3x faster than the 1-process version, and the single line on which I write the in-memory numpy array to the precreated and presized dataset in the single output H5 file takes virtually the entire runtime.
This makes me suspect that under the hood, even though h5py+MPI can collect "requests" to write to datasets from multiple independent processes in parallel, each individual write to the h5 file still has to happen sequentially in reality. So if N parallel processes are simultaneously trying to do their writes, they won't actually realize an Nx speedup if the bottleneck is actually writing to the H5 file.
Can you let me know if I'm understanding h5py+MPI's parallel write correctly?
Thank you.
The text was updated successfully, but these errors were encountered: