Replies: 1 comment 1 reply
-
That's a good question that we would like to know the answer to :) Sadly, we don't have a lot of experience with SLURM and we're not sure how best to configure this. Regarding the multinode situation, my understanding is that |
Beta Was this translation helpful? Give feedback.
-
First of all: I'm loving HyperQueue and its features. 🙏 Since integrating it with our workflow manager AiiDA it's been an invaluable tool to partially use nodes on clusters that have an exclusive node-job policy, and avoiding queueing for very small jobs in my workflows that need to run on the compute nodes.
I've been having a bit of trouble combining HyperQueue with MPI, however. Below I outline my current approach, would be great to get some feedback and suggestions!
Running on a single node
For the use cases described above, I've so far been using HQ with an allocation that only uses a single node. Since the calculations I'm running are vastly more efficient with MPI, I typically run a HQ auto-allocation such as:
And then submit HQ jobs similar to:
This can also be used for submitting multiple jobs in parallel on a single node, but then I typically have to tweak the
--oversubscribe
,--overlap
and--cpu-bind
(in combination with$HQ_CPUS
) to make it work. This seems to be cluster-dependent and I can't always get it to work. Is there a better approach I'm missing?Running on multiple nodes
Another use case is when I want to run a multi-node Slurm job, and run multiple single-node HQ jobs in this Slurm job. This can be useful when I have a lot of small jobs to run but Slurm is configured to only allow a certain number of jobs in the queue for a partition.
To test, this I was looking at the documentation for manual submission:
https://it4innovations.github.io/hyperqueue/stable/deployment/worker/#deploying-a-worker-using-pbsslurm
And also trying an auto-allocation with multiple nodes:
Both suggest using
mpirun
/srun
to run theworker start
command. Below is the submission script generated by HQ for the auto-allocation:When trying this with the HQ job script above, I obviously get into trouble. Calling
srun
within ansrun
step seems dubious, and I'm asking for 128 tasks within a job step which only has 4, so I get an errorRemoving
srun
doesn't have the desired effect either though. HQ is running 4 workers once the allocation starts, but the calculations don't run in parallel, and the one running is only using 4 mpi tasks.My current "solution"
After quite a bit of trial and error (in lieu of understanding and experience), I've come up with a solution that almost works. Basically, I run the same HQ job script as above for the calculations, but do a manual Slurm submission starting four HQ workers in the background:
Full Slurm submission script
When running 4 jobs with 128 tasks, the first three run just fine in parallel, with a similar performance as a single run directly submitted to Slurm. However the fourth fails with the following in the
stderr
:Interestingly, if I run on 3 nodes with 3 HQ workers, I don't get this issue at all. I'm still looking into running with more nodes, but am queueing quite a bit atm.
Again, any suggestions or tips would be most appreciated!
Beta Was this translation helpful? Give feedback.
All reactions