You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context - we need to change PSTRID to interleave atm and ocn processes on the same nodes, which would allow us to do coupled k-scale runs almost as quickly as we can do atm-only F cases right now. Thus I see this as a moderately high priority task.
Since sept 4th is a while ago, can we confirm first that the error still happens with current master?
Other thoughts:
does this happen regardless of I/O? That is, does the crash happen without any output stream?
during which timestep does the error show up? Any chance we can infer which subcycle iter of p3 this was? You may have to increase the log level (in driver options) to get a bit more info in atm.log.
does this happen for every non-default value of pstrid?
rljacob
changed the title
PSTRID process striding
PSTRID process striding broken in EAMxx
Oct 18, 2024
rljacob
changed the title
PSTRID process striding broken in EAMxx
PSTRID process striding broken in EAMxx gpu cases
Oct 18, 2024
I'm getting run-time property-check errors with non-default PSTRID and hoping someone can take look.
runs fine by default on 8 nodes at 4 tasks/node.
If I set process stride PSTRID=16 (also 4 tasks/node at 8 nodes)
I get errors below.
A similar case works fine on CPUs:
Error:
Path to that run-dir:
This is with Sep-4 version of master 42ab514 .
The text was updated successfully, but these errors were encountered: