Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on building gdasapp on orion #1418

Open
shlyaeva opened this issue Dec 16, 2024 · 11 comments
Open

Question on building gdasapp on orion #1418

shlyaeva opened this issue Dec 16, 2024 · 11 comments
Labels
question Further information is requested

Comments

@shlyaeva
Copy link
Collaborator

My assumption is that I can build gdasapp 3 ways:

  1. via global-workflow build script, e.g. global-workflow/sorc/build_all.sh -gu (which rebuilds everything in gw)
  2. via gdasapp build script, e.g. global-workflow/sorc/gdas/build.sh (which rebuilds everything in gdasapp)
  3. via loading gdasapp modules and running make in the build directory, e.g.: cd global-workflow/sorc/gdas/; module use modulefiles/GDAS; module load $machine.$comp; cd build/gdas; make -j12

I think I have successfully used many combinations of these three ways with different machines (orion, hera, hercules) at different times in the past month, but now I'm struggling to build on orion with 2 and 3. I wanted to check if that's expected behavior currently, or whether I'm doing something wrong.

For 2, the code builds, but only a small subset of the tests is available in gdas build directory (none of the gw-ci tests are there).

For 3, I fail to load modules on orion with the following error:

orion-login-3[16] ashlyaev$ module use modulefiles/GDAS/
orion-login-3[17] ashlyaev$ module load orion.intel
Lmod has detected the following error:  The following module(s) are unknown: "libxpm/4.11.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "libxpm/4.11.0"

Also make sure that all modulefiles written in TCL start with the string #%Module

Executing this command requires loading "libxpm/4.11.0" which failed while processing the following module(s):

    Module fullname  Module Filename
    ---------------  ---------------
    orion.intel      /work2/noaa/da/ashlyaev/global-workflow/sorc/gdas.cd/modulefiles/GDAS/orion.intel.lua
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    orion.intel      /work2/noaa/da/ashlyaev/global-workflow/sorc/gdas.cd/modulefiles/GDAS/orion.intel.lua

If I naively remove the offending module, I get

orion-login-3[13] ashlyaev$ module load orion.intel
Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /work2/noaa/da/ashlyaev/global-workflow/sorc/gdas.cd/modulefiles/GDAS/orion.intel.lua: [string "help([[..."]:91: attempt to concatenate a boolean value (local 'pkgVersion')
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    orion.intel      /work2/noaa/da/ashlyaev/global-workflow/sorc/gdas.cd/modulefiles/GDAS/orion.intel.lua

and I think I have seen a similar message on a different platform before (probably hera? but I'm not sure).

Am I doing something incorrectly when I try to build with 2 and 3 on orion? Is there a recommended way for the quickest rebuild of gdasapp (without rebuilding everything from scratch)?

@shlyaeva shlyaeva added the question Further information is requested label Dec 16, 2024
@guillaumevernieres
Copy link
Contributor

@DavidNew-NOAA recently updated the gw-ci tests. See

option(TEST_GSI "Enable the GFSv17 GSI tests" ON)
for how to trigger your favorite gw-ci ctest.

I gave up on Orion ages ago. I ony run on Hercules or Hera.

@shlyaeva
Copy link
Collaborator Author

@guillaumevernieres, yes, I have the option ON, and when I run global-workflow/sorc/build_all.sh -gu it builds properly, but when I run global-workflow/sorc/gdas/build.sh those tests are missing. I'm trying to find a way to not spend an hour rebuilding all of the code when I make trivial changes in soca.
Good point on hera and hercules, I'm building there in parallel as well. I have a feeling I was hitting similar issues there before too, but I'll report back if it's more than a feeling :)

@shlyaeva
Copy link
Collaborator Author

I do hit an issue loading gdasapp modules on hera as well:

[Anna.V.Shlyaeva@hfe02 gdas.cd]$ module load hera.intel
Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /scratch1/NCEPDEV/da/Anna.Shlyaeva/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hera.intel.lua: [string "help([[..."]:95: attempt to concatenate a boolean value (local 'pkgVersion')
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    hera.intel       /scratch1/NCEPDEV/da/Anna.Shlyaeva/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hera.intel.lua

@shlyaeva
Copy link
Collaborator Author

An on hercules:

[ashlyaev@hercules-login-4 gdas.cd]$ module load hercules.intel
Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /work2/noaa/da/ashlyaev/hercules/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hercules.intel.lua: [string "help([[..."]:92: attempt to concatenate a boolean value (local 'pkgVersion')
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    hercules.intel   /work2/noaa/da/ashlyaev/hercules/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hercules.intel.lua

@RussTreadon-NOAA
Copy link
Contributor

@shlyaeva: We need to build GDASApp inside g-w in order for g-w based GDASAApp ctests to be active. Since I usually run g-w based ctests, I begin my development cycle with your option 1. If subsequent development requires me to make changes to GDASApp source code or ctests, I then flip over to option 3 to rebuild GDASApp.

As a test I cloned GDASApp develop at d6097af on Hera, Hercules, and Orion. Below is the behavior I see when I load GDASApp/${machine}.intel

Hera

Hera(hfe03):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/develop$ module list

Currently Loaded Modules:
  1) rocoto/1.3.7   2) cmake/3.28.1



Hera(hfe03):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/develop$ module use /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/develop/modulefiles/
Hera(hfe03):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/develop$ module load GDAS/hera.intel

The following have been reloaded with a version change:
  1) cmake/3.28.1 => cmake/3.23.1     2) rocoto/1.3.7 => rocoto/1.3.6

Hercules

hercules-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module list

Currently Loaded Modules:
  1) contrib/0.1     3) rocoto/1.3.7    5) glx/1.4              7) zlib/1.2.13     9) qt/5.15.8
  2) noaatools/3.1   4) git-lfs/3.1.2   6) libxkbcommon/1.4.0   8) sqlite/3.39.4  10) cmake/3.26.3



hercules-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module use /work/noaa/da/rtreadon/git/GDASApp/develop/modulefiles/
hercules-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module load GDAS/hercules.intel
-------------------------------------------------------------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: py-numpy/1.22.3 (required by: bufr/12.0.1), python/3.10.13 (required by: boost/1.84.0, bufr/12.0.1, fckit/0.11.0, atlas/0.36.0, py-pybind11/2.11.0)
-------------------------------------------------------------------------------------------------------------------------------------

Due to MODULEPATH changes, the following have been reloaded:
  1) git-lfs/3.1.2     2) zlib/1.2.13

The following have been reloaded with a version change:
  1) cmake/3.26.3 => cmake/3.23.1

Orion

orion-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module list

Currently Loaded Modules:
  1) contrib/0.1     3) rocoto/1.3.7    5) glx/1.4              7) zlib/1.2.13     9) qt/5.15.8
  2) noaatools/3.1   4) git-lfs/3.1.2   6) libxkbcommon/1.4.0   8) sqlite/3.39.4  10) cmake/3.26.3



orion-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module use /work/noaa/da/rtreadon/git/GDASApp/develop/modulefiles/
morion-login-4:/work/noaa/da/rtreadon/git/GDASApp/develop$ module load GDAS/orion.intel
--------------------------------------------------------------------------------------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: py-numpy/1.22.3 (required by: bufr/12.0.1), python/3.10.13 (required by: boost/1.83.0, bufr/12.0.1, fckit/0.11.0, atlas/0.35.1, py-pybind11/2.11.0)
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Due to MODULEPATH changes, the following have been reloaded:
  1) git-lfs/3.1.2     2) zlib/1.2.13

The following have been reloaded with a version change:
  1) cmake/3.26.3 => cmake/3.23.1

@shlyaeva
Copy link
Collaborator Author

Thank you @RussTreadon-NOAA, that's very helpful!

On hercules, if I do module use modulefiles/GDAS; module load hercules.intel, I get an error as reported above:

(gdasapp) [ashlyaev@hercules-login-3 gdas.cd]$ module use modulefiles/GDAS
(gdasapp) [ashlyaev@hercules-login-3 gdas.cd]$ module load hercules.intel
Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /work2/noaa/da/ashlyaev/hercules/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hercules.intel.lua: [string "help([[..."]:92: attempt to concatenate a boolean value (local 'pkgVersion')
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    hercules.intel   /work2/noaa/da/ashlyaev/hercules/global-workflow/sorc/gdas.cd/modulefiles/GDAS/hercules.intel.lua

But if I follow what you do, and do module use modulefiles; module load GDAS/hercules.intel that works:

[ashlyaev@hercules-login-3 gdas.cd]$ module use modulefiles/
[ashlyaev@hercules-login-3 gdas.cd]$ module load GDAS/hercules.intel
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: py-numpy/1.22.3 (required by: bufr/12.0.1)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Due to MODULEPATH changes, the following have been reloaded:
  1) gettext/0.21.1     2) libxcrypt/4.4.35     3) libyaml/0.2.5     4) openblas/0.3.24     5) py-markupsafe/2.1.3     6) py-pyyaml/6.0     7) sqlite/3.43.2     8) stack-intel/2021.9.0     9) tar/1.34    10) util-linux-uuid/2.38.1    11) zlib/1.2.13    12) zstd/1.5.2

The following have been reloaded with a version change:
  1) pigz/2.7 => pigz/2.8     2) py-jinja2/3.1.2 => py-jinja2/3.0.3     3) python/3.11.6 => python/3.10.13

Not sure why, but since this resolves this for me, I'm closing the issue. Thank you!

@RussTreadon-NOAA
Copy link
Contributor

@shlyaeva , I replicated the behavior you observed by adding /GDAS to the end of the module use line.

A few google searches and trial and error stumbled across the following change to GDAS/${machine}.intel.lua which works on Orion and Hera

 whatis("Name: ".. pkgName)
-whatis("Version: ".. pkgVersion)
+whatis("Version: ".. tostring(pkgVersion))
 whatis("Category: GDASApp")

With this change to hera.intel.lua no error occurs when including /GDAS on the module use path.

Hera(hfe08):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles$ module list
No modules loaded

Hera(hfe08):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles$ module use /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles/GDAS

Hera(hfe08):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles$ module load hera.intel

Hera(hfe08):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles$ module list

Currently Loaded Modules:
  1) intel/2022.1.2                   20) netcdf-fortran/4.6.1    39) openblas/0.3.24   58) libxmu/1.1.4           77) py-scipy/1.11.3
  2) stack-intel/2021.5.0             21) antlr/2.7.7             40) eckit/1.24.5      59) libxpm/4.11.0          78) py-packaging/23.1
  3) impi/2022.1.2                    22) gsl/2.7.1               41) fftw/3.3.10       60) libxaw/1.0.13          79) py-bottleneck/1.3.7
  4) stack-intel-oneapi-mpi/2021.5.1  23) nco/5.0.6               42) fckit/0.11.0      61) udunits/2.2.28         80) py-numexpr/2.8.4
  5) cmake/3.23.1                     24) parallelio/2.5.10       43) fiat/1.2.0        62) ncview/2.1.9           81) py-et-xmlfile/1.0.1
  6) gettext/0.19.8.1                 25) wget/1.14               44) fms/2023.04       63) netcdf-cxx4/4.3.1      82) py-openpyxl/3.1.2
  7) pcre2/10.42                      26) libxcrypt/4.4.35        45) esmf/8.6.0        64) json/3.10.5            83) py-six/1.16.0
  8) nghttp2/1.57.0                   27) sqlite/3.43.2           46) ectrans/1.2.0     65) rocoto/1.3.6           84) py-python-dateutil/2.8.2
  9) curl/8.4.0                       28) util-linux-uuid/2.38.1  47) qhull/2020.2      66) bacio/2.4.1            85) py-pytz/2023.3
 10) zlib/1.2.13                      29) python/3.10.13          48) atlas/0.35.1      67) w3emc/2.10.0           86) py-pyxlsb/1.0.10
 11) git/2.18.0                       30) boost/1.83.0            49) sp/2.5.0          68) prod_util/2.1.1        87) py-xlrd/2.0.1
 12) pkg-config/0.27.1                31) py-setuptools/63.4.3    50) gsl-lite/0.37.0   69) py-markupsafe/2.1.3    88) py-xlsxwriter/3.1.7
 13) hdf5/1.14.0                      32) py-numpy/1.22.3         51) libjpeg/2.1.0     70) py-jinja2/3.0.3        89) py-xlwt/1.3.0
 14) parallel-netcdf/1.12.2           33) bufr/12.0.1             52) krb5/1.20.1       71) py-cftime/1.0.3.4      90) py-pandas/1.5.3
 15) snappy/1.1.10                    34) git-lfs/2.10.0          53) libtirpc/1.3.3    72) py-netcdf4/1.5.8       91) py-xarray/2023.7.0
 16) zstd/1.5.2                       35) ecbuild/3.7.2           54) hdf/4.2.15        73) py-pybind11/2.11.0     92) py-f90nml/1.4.3
 17) c-blosc/1.21.5                   36) openjpeg/2.3.1          55) jedi-cmake/1.4.0  74) py-pycodestyle/2.11.0  93) py-pip/23.1.2
 18) netcdf-c/4.9.2                   37) eccodes/2.32.0          56) libpng/1.6.37     75) libyaml/0.2.5          94) py-click/8.1.7
 19) nccmp/1.9.0.1                    38) eigen/3.4.0             57) libxt/1.1.5       76) py-pyyaml/6.0          95) hera.intel

I also tried module use /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/stable/modulefiles followed by module load GDAS/hera.intel. This also worked.

@shlyaeva
Copy link
Collaborator Author

@RussTreadon-NOAA nice, thank you for looking into this and finding a fix! 🎉

@RussTreadon-NOAA
Copy link
Contributor

RussTreadon-NOAA commented Dec 19, 2024

Work for this issue will be done in feature/modulefiles

@RussTreadon-NOAA
Copy link
Contributor

Using tostring() is not the correct solution. myModuleVersion() returns false because the module does not set the version.

If we focus on Orion, module show GDAS/orion.intel returns

whatis("Name: GDAS")
whatis("Version: orion.intel")
whatis("Category: GDASApp")
whatis("Description: Load all libraries needed for GDASApp")

In contrast, module show orion.intel returns

Lmod has detected the following error:  Unable to load module because of error when evaluating modulefile:
     /work2/noaa/da/rtreadon/git/GDASApp/develop/modulefiles/GDAS/orion.intel.lua: [string "help([[..."]:92: attempt to concatenate a boolean value
(local 'pkgVersion')
     Please check the modulefile and especially if there is a line number specified in the above message
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    orion.intel      /work2/noaa/da/rtreadon/git/GDASApp/develop/modulefiles/GDAS/orion.intel.lua

Replacing

-local pkgVersion = myModuleVersion()
+local pkgVersion = tostring(myModuleVersion())

and executing module show orion.intel returns

whatis("Name: orion.intel")
whatis("Version: false")
whatis("Category: GDASApp")
whatis("Description: Load all libraries needed for GDASApp")

With module show GDAS/orion.intel we get myModuleName() = GDAS and myModuleVersion() = orion.intel
With module show orion.intel we get myModuleName() = orion.intel and myModuleVersion() = false. The false apparently indicates undefined.

I can not find detailed documentation on myModuleName() and myModuleVersion() but it seems that these two functions take the ${modulefile} passed to module show ${modulefile} and divide it with Name being the prefix of ${modulefile} and Version being the suffix of ${modulefile}.

As a test try the following change in orion.intel.lua

 local pkgName    = myModuleName()
-local pkgVersion = myModuleVersion()
+local pkgVersion = myModuleVersion() or "1.0"
 local pkgNameVer = myModuleFullName()

With this change in place, module show orion.intel returns

whatis("Name: orion.intel")
whatis("Version: 1.0")
whatis("Category: GDASApp")
whatis("Description: Load all libraries needed for GDASApp")

Is this an acceptable solution?

What do you think?

Tagging @CoryMartin-NOAA , @DavidNew-NOAA , @danholdaway , @guillaumevernieres , @shlyaeva for awareness

@CoryMartin-NOAA
Copy link
Contributor

I'm okay with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants