Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case #2442

Draft
wants to merge 88 commits into
base: develop
Choose a base branch
from

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z.

The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA. The wave element mask has been modified as in TODO.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

The test suite regression tests basic reproducibility/quality checks, particularly:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • changing number of threads reproduces control
  • Intel debug version reproduces itself
  • GNU and GNU debug versions run
    • GNU debug on hera fails with likely openmpi error:
      140: The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
      140: Workarounds are to run on a single node, or to use a system with an RDMA
      140: capable network such as Infiniband.
    • GNU and GNU debug on hercules fail with NetCDF HDF error
      Error in handle_err: get_var3_r4 get_vara_real delp_inc NetCDF: HDF error

with caveats across supported platforms. TODO see pre-test matrix

  • All tests pass on supported platforms

    • Hera (intel)
    • Hercules (intel)
    • Gaea (intel)
    • Derecho (intel)
      With increased WAV_tasks, hangs 30 minutes into simulation
    • Larger issue of GNU support
  • No major diffs from GEFS workflow configuration
    I have tried to note intentional differences. Please let us know if you see differences.

    Depending on aerosol coupling, GOCART .rc files and ExtData directory structure may be revised for consistency with global-workflow.

    This benchmark configuration and case may be updated, particularly with GEFS reforecast or UFS case study.

Test failures may be better served via library/platform support and work will continue in follow up issues, including:

  • [] Larger issue of GNU support
  • [] Sensitivity to spack-stack updates
  • [] individual test results

TODO: Scripts need finalizing once filepaths are in shared space.
Input data is currently in user space on hera:
/scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/input_data/gefs.v13/RT_GEFS/

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:


Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
@NickSzapiro-NOAA
Copy link
Collaborator Author

Hi @jkbk2004 . If there are no input data changes coming up, can we stage input data for new test cases? Maybe @[INPUTDATA_ROOT]/GEFS/ is a good place.

It's on hera at /scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/input_data/gefs.v13/RT_GEFS/ . The contents of WW3 subdirectory probably belong under INPUTDATA_ROOT_WW3 instead

@edwardhartnett
Copy link
Contributor

Are the versions of PIO, netcdf-fortran, netcdf-c, and HDF5 all the same on hercules as on other platforms?

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Dec 2, 2024

Are the versions of PIO, netcdf-fortran, netcdf-c, and HDF5 all the same on hercules as on other platforms?

Hi @edwardhartnett. Hercules is maybe the only platform reliably running gnu tests right now. I see the same gnu and intel libraries via spack-stack-1.6.0 on hercules:
gcc/12.2.0: parallelio-2.5.10 parallel-netcdf-1.12.2 netcdf-fortran-4.6.1 netcdf-c-4.9.2 hdf5-1.14.0
intel/2021.9.0: parallelio-2.5.10 parallel-netcdf-1.12.2 netcdf-fortran-4.6.1 netcdf-c-4.9.2 hdf5-1.14.0

@edwardhartnett
Copy link
Contributor

OK, I thought we had switched to hdf5-1.14.3?

Some parallel I/O bugs were fixed in HDF5-1.14.3.

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Dec 5, 2024

OK, I thought we had switched to hdf5-1.14.3?

Some parallel I/O bugs were fixed in HDF5-1.14.3.

@edwardhartnett We can test with updated packages. What versions would you suggest?

@edwardhartnett
Copy link
Contributor

hdf5-1.14.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cpld_bmark_p8 with GEFSv13 EP5 configuration Add RT test for gocart_on, gccpp_on, nasa_on
8 participants