Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the build capability of gdasapp on compute nodes (in addition to on frontend nodes) #1328

Open
aerorahul opened this issue Oct 11, 2024 · 4 comments

Comments

@aerorahul
Copy link
Contributor

To speed up the building of the global application, it would be nice to have the gdasapp be built on a compute node.
Presently, the configure stage of building gdasapp is set to download data (testdata, fixdata?) from an online store. The online store is not available from compute nodes (we could try to build on service nodes and check if those have access to online stores)

One piece of block that enables to build on the compute node on Hera is here. This check could be extended to Hercules/Orion/WCOSS2 as a quick fix.

@aerorahul
Copy link
Contributor Author

Any updates on this is appreciated.

@RussTreadon-NOAA
Copy link
Contributor

The restriction

if [[ $BUILD_TARGET == 'hera' ]]; then

in build.sh has been removed.

This change was included in GDASApp PR #1327.

I do not know if anyone has tried to executed build.sh on a compute node since this change was committed to GDASApp develop.

@RussTreadon-NOAA
Copy link
Contributor

Compute node build test

Do the following on WCOSS2 (Dogwood), Hera, and Orion

  1. clone g-w develop at 7090cff
  2. create simple script to run build_gdas.sh via batch job. For example, below is the script used on Hera
#!/bin/bash

#SBATCH -A da-cpu
#SBATCH -o test_build.o%J
#SBATCH -J test_build
#SBATCH -q batch
#SBATCH --partition=hera
#SBATCH -t 2:00:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --export=NONE

set -ax

export repodir=/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/test
cd $repodir/sorc
mkdir -p logs
date
./build_gdas.sh > logs/build_gdas.log 2>&1
date

I'm not sure what to request for tasks-per-node or cpus-per-task. The GDASApp compile is run with make -j 8. Should we specify --cpus-per-task=8 instead?

  1. submit batch job. Below are batch jobs which were run on

Dogwood

dbqs01: 
                                                                 Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
209006364.dbqs01     russ.tr* dev      test_build 238293   1   8    --  02:00 R 00:08

Hera

     JOBID PARTITION  NAME                     USER             STATE        TIME TIME_LIMIT NODES NODELIST(REASON)
   3760275 hera       test_build               Russ.Treadon     RUNNING      9:17    2:00:00     1 h24c55

Orion

       JOBID PARTITION                      NAME               USER ST       TIME  NODES     NODELIST(REASON) QOS
    19249876     orion                test_build           rtreadon  R       7:14      1          orion-01-45 batch

build_gdas.sh successfully ran to completion on each machine with the following build times in (hours:minutes:seconds)

  • Dogwood: 00:32:22
  • Hera: 00:44.41
  • Orion: 00:56:00

Again, it may be possible to lower the build times via adjustments to the job configuration. This was not an optimization test. I just wanted to see if GDASApp could be built on compute nodes via a batch script. It can.

@RussTreadon-NOAA
Copy link
Contributor

This issue is relevant to g-w issue #3131.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants