Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install/configure SLURM+MPI for deploying on Azure #60

Open
aculich opened this issue Jul 29, 2015 · 5 comments
Open

install/configure SLURM+MPI for deploying on Azure #60

aculich opened this issue Jul 29, 2015 · 5 comments

Comments

@aculich
Copy link
Contributor

aculich commented Jul 29, 2015

Enabling this in a Savio-compatible way would allow mobility of code from a local HPC cluster to Azure cloud-based HPC cluster-- without an end-user having to change their code or submit scripts.

See azure-quickstart-templates#422 for further discussion.

@aculich aculich self-assigned this Jul 29, 2015
@paciorek
Copy link
Contributor

how often would a user want to use SLURM on a cloud-based cluster.
Presumably the user would generally have full and exclusive control over
the cloud nodes and therefore not need a scheduler?

They won't be able to use their Savio SLURM script anyway as the QoS and
other details will differ, no?

That said, it would be nice if an MPI submission would just work on a
cloud-based cluster based on a job previously run on Savio.

We may want to discuss how to prioritize use of AWS, Google, and Azure for
BCE. I would put them in that order, I think.

On Tue, Jul 28, 2015 at 6:14 PM, Aaron Culich notifications@github.com
wrote:

Enabling this in a Savio-compatible way
http://research-it.berkeley.edu/services/high-performance-computing/system-overview
would allow mobility of code from a local HPC cluster to Azure cloud-based
HPC cluster-- without an end-user having to change their code or submit
scripts
.

See azure-quickstart-templates#422
Azure/azure-quickstart-templates#422 (comment)
for further discussion.


Reply to this email directly or view it on GitHub
#60.

@aculich
Copy link
Contributor Author

aculich commented Jul 29, 2015

Since Azure already provides a pre-configured Ubuntu instance running SLURM, my request to them is to simply add MPI to their image.

This addresses the use case of migrating from Savio (or similarly configure HPC cluster) directly to a cloud environment without changing the code or submit scripts-- since changing that or retraining people can take some time.

In the long run there is probably a better strategy than using SLURM, but I'm focused right now on having a very easy migration path for people. An example is our reading group topic this Thursday using Savio, Azure, and AWS.

As far as prioritizing which platforms, I am agnostic to that... really just responding to the demand for each as I hear it.

@ck37
Copy link
Contributor

ck37 commented Jul 29, 2015

After a brief superficial look, this seems awesome - seems like it would really make it easier for people to migrate Azure <-> Savio. I agree that reducing code changes in a research environment is really the key imo. EC2 <-> Savio would then be ideal.

@paciorek
Copy link
Contributor

For some parallel and AWS functionality, I wrote some add-on scripts, so an
open question is whether this SLURM+MPI functionality should be in core BCE
or available as an add-on.

On Tue, Jul 28, 2015 at 7:32 PM, Chris Kennedy notifications@github.com
wrote:

After a brief superficial look, this seems awesome - seems like it would
really make it easier for people to migrate Azure <-> Savio. I agree that
reducing code changes in a research environment is really the key imo. EC2
<-> Savio would then be ideal.


Reply to this email directly or view it on GitHub
#60 (comment).

@aculich
Copy link
Contributor Author

aculich commented Jul 31, 2015

Hadn't noticed until now that you had updated the documentation on how to use BCE to include the scripts for the parallel computation tools. I'll give that a shot.

Azure now supports RDMA, though currently only with their SUSE Linux image, but will add support for others in the future:

The current release of Azure Linux RDMA supports SUSE Linux Enterprise Server 12 (SLES12). We will continue to work with other Linux distributions and will have more to say about other supported distributions in near future. A SLES 12 image with completely integrated RDMA drivers specifically tuned for HPC workloads is available now in the Azure market place.

@aculich aculich added this to the fall-2015 nice to haves milestone Sep 21, 2015
@aculich aculich removed their assignment Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants