Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for MASK_OUTSIDE_OBCS with MASKING_DEPTH #752

Open
wants to merge 11 commits into
base: dev/gfdl
Choose a base branch
from

Conversation

kshedstrom
Copy link

The MASK_OUTSIDE_OBCS flag doesn't know about the MASKING_DEPTH and this should take care of the problem.

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes mirror the code setting the MASKING_DEPTH elsewhere in the MOM6 code, and they make sense to me to add them here as well.

@kshedstrom kshedstrom force-pushed the fix_obc_maskingdepth branch from a4ec069 to ef18f5a Compare November 7, 2024 00:30
@kshedstrom
Copy link
Author

I am hopefully done changing things on this branch now.

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 72.50000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 36.65%. Comparing base (31a4d8b) to head (d59cf6d).

Files with missing lines Patch % Lines
src/core/MOM_open_boundary.F90 0.00% 4 Missing ⚠️
src/tracer/MOM_tracer_advect.F90 75.00% 0 Missing and 4 partials ⚠️
src/core/MOM_barotropic.F90 85.00% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           dev/gfdl     #752      +/-   ##
============================================
+ Coverage     36.63%   36.65%   +0.01%     
============================================
  Files           278      278              
  Lines         84143    84182      +39     
  Branches      15833    15851      +18     
============================================
+ Hits          30826    30855      +29     
- Misses        47504    47507       +3     
- Partials       5813     5820       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adcroft
Copy link
Member

adcroft commented Nov 11, 2024

@kshedstrom In commit 01b0dc4 you updated the way to handle the v-direction which I think makes sense. However, you didn't change the way the u-direction is handled and left it in the form of the previous commit. I don't think this explains the MacOS fails (which @marshallward suspects is a new gnu-compiler options thing) but currently I think it breaks the rotational symmetry rule.

Copy link
Member

@adcroft adcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hallberg-NOAA earlier approved the first version, and I agree with Bob that this seems like a good fix. However, a subsequent "better" commit broke symmetry (see #752 (comment)). This seems easy to fix (apply to u- what was done to v-) but in the mean-time I'll mark this as "changes requested".

@kshedstrom
Copy link
Author

The symmetric version worked for EW boundaries, but not NS boundaries. The algorithm is sweeping across lines of i and is inherently non-symmetric. With non-zero turns, it would need fixing, I suppose. I'll have to investigate how the turns are actually done.

@kshedstrom
Copy link
Author

I rotated the Supercritical test and Thomas's test and I stand by my current version. The failure is:

FAIL: Diagnostics tc3.regression.diag have changed.

One version of Thomas's test was spinning up a circulating flow while the current version has zero flow with nothing spinning it up. The previous code didn't have eta zeroed out outside the OBCs. I wouldn't be surprised if those eta values were being used somehow. Do we need to set them to some wacky value and see what happens?

@kshedstrom
Copy link
Author

kshedstrom commented Nov 17, 2024

The Thomas test when rotated runs when compiled for debugging, but not when compiled for repro. It fails with:

[chinook04:2911387:0:2911387] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:2911387) ====
 0 0x0000000000012d20 __funlockfile()  :0  
 1 0x0000000001183528 __mpp_domains_mod_MOD_mpp_update_domain2d_r8_3dv()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/FMS2/mpp/include/mpp_update_domains2D.fh:518
 2 0x0000000000a69d5e __mom_domain_infra_MOD_pass_vector_3d()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/infra/FMS2/MOM_domain_infra.F90:709
 3 0x000000000067957e __mom_open_boundary_MOD_open_boundary_init()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM_open_boundary.F90:1941
 4 0x0000000000c17638 __mom_state_initialization_MOD_mom_initialize_state()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/initialization/MOM_state_initialization.F90:606
 5 0x0000000000924c7b __mom_MOD_initialize_mom()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/core/MOM.F90:2979
 6 0x00000000005a2e70 MAIN__()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/drivers/solo_driver/MOM_driver.F90:280
 7 0x000000000040761d main()  //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/config_src/drivers/solo_driver/MOM_driver.F90:27
 8 0x000000000003a7e5 __libc_start_main()  ???:0
 9 0x000000000040765e _start()  ???:0
=================================

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

The line in question is:

  if (OBC%radiation_BCs_exist_globally) call pass_vector(OBC%rx_normal, OBC%ry_normal, G%Domain, &
                     To_All+Scalar_Pair)

At this point in the run, OBC%rx_normal and ry_normal have not been allocated - when the rotation is in play, it is allocated otherwise.

@kshedstrom
Copy link
Author

kshedstrom commented Nov 17, 2024

In MOM_state_initialization, there is a CS%OBC which has the r[xy]_normal allocated. There is also an OBC_in which does not have them allocated (in the rotated case).

@kshedstrom
Copy link
Author

Also, Thomas' test is sensitive to zeroing out the outside eta, but none of the rest of my tests are. It must be something about the OBC choices he picked. As for the other issue, perhaps @marshallward knows why the rotated MOM_state_initialization gets one version of the OBC structure and the nonrotated ones get the other?

@kshedstrom
Copy link
Author

I ran dueling debuggers and the answers do match inside the domain, spinning up a gyre and all. The Thomas test has a positive eta_outside in the Flather OBC. That drives fluid from outside to the inside. It sucks water from the point just outside the boundary, causing eta to get lower just outside the boundary without the MOM_barotropic fix, eventually making it lower than the ocean bottom.

I still stand by my fix, except for the rotational weirdness.

@kshedstrom
Copy link
Author

I'm trying a pointer fix to the OBC data structure problem. Things seem to be running...

@kshedstrom
Copy link
Author

Running, maybe, but these are the diffs for the not rotated vs rotated Thomas test:

diff --git a/ocean_only/interior_OBCs/ocean.stats.gnu b/ocean_only/interior_OBCs/ocean.stats.gnu
index f7ee4fd9..fad32331 100644
--- a/ocean_only/interior_OBCs/ocean.stats.gnu
+++ b/ocean_only/interior_OBCs/ocean.stats.gnu
@@ -1,7 +1,7 @@
   Step,       Day,  Truncs,      Energy/Mass,      Maximum CFL,  Mean Sea Level,  Total Mass,  Mean Salin, Mean Temp, Frac Mass Err,   Salin Err,    Temp Err
             [days]                 [m2 s-2]           [Nondim]       [m]             [kg]         [PSU]      [degC]       [Nondim]        [PSU]        [degC]
      0,  726482.000,     0, En 1.2731932183656516E-25, CFL  0.00000, SL  5.0974E-13, M 1.40039E+14, S 17.5000, T 13.5000, Me  0.00E+00, Se  0.00E+00, Te  0.00E+00
-    36,  726482.250,     0, En 9.5246101764079154E-04, CFL  0.16364, SL  9.8685E-02, M 1.41421E+14, S 17.6380, T 13.4429, Me  9.77E-03, Se  3.09E-01, Te  7.48E-02
-    72,  726482.500,     0, En 1.7087457679639225E-03, CFL  0.19862, SL  9.6871E-02, M 1.41396E+14, S 17.8932, T 13.3365, Me -1.80E-04, Se  2.52E-01, Te -1.09E-01
-   108,  726482.750,     0, En 1.3284186657244862E-03, CFL  0.17970, SL  1.0015E-01, M 1.41442E+14, S 18.0322, T 13.2783, Me  3.24E-04, Se  1.45E-01, Te -5.39E-02
-   144,  726483.000,     0, En 1.1744820816381933E-03, CFL  0.17486, SL  1.0224E-01, M 1.41471E+14, S 18.0998, T 13.2499, Me  2.07E-04, Se  7.13E-02, Te -2.56E-02
+    36,  726482.250,     0, En 1.2731932183656516E-25, CFL  0.00000, SL  5.0974E-13, M 1.40039E+14, S 17.5000, T 13.5000, Me  0.00E+00, Se -1.95E-14, Te -5.74E-15
+    72,  726482.500,     0, En 1.2731932183656516E-25, CFL  0.00000, SL  5.0974E-13, M 1.40039E+14, S 17.5000, T 13.5000, Me  0.00E+00, Se -1.11E-14, Te -6.62E-15
+   108,  726482.750,     0, En 1.2731932183656516E-25, CFL  0.00000, SL  5.0974E-13, M 1.40039E+14, S 17.5000, T 13.5000, Me  0.00E+00, Se -1.48E-14, Te -8.96E-15
+   144,  726483.000,     0, En 1.2731932183656516E-25, CFL  0.00000, SL  5.0974E-13, M 1.40039E+14, S 17.5000, T 13.5000, Me  0.00E+00, Se -8.73E-15, Te -5.17E-15

Maybe I'll figure it out tomorrow.

@kshedstrom
Copy link
Author

For the non-rotated case, OBC%segment(1)%field(1)%buffer_dst is allocated and set to the SSH outside value.
For the rotated case, it is not allocated. Then in MOM_barotropic, BT_OBC%SSH_outer_u is all zeroes instead of having the correct value at the boundaries.

@kshedstrom
Copy link
Author

Oh, it got allocated, set to the appropriate value, then decallocated.

@kshedstrom
Copy link
Author

Is this where it populated the whole OBC_in structure, then gets rid of it as part of the rotation? @marshallward

@kshedstrom
Copy link
Author

rotate_OBC_segment_data is called after the segment fields with a value use the value to fill buffer_dst. It copies over the value, but not the filled buffer_dst.

@marshallward
Copy link
Member

Sorry @kshedstrom I was away last week and missed these pings. I will also try to look into this with you.

@kshedstrom
Copy link
Author

Something is going on with the tracer reservoirs. At some point during initialization, they have the same values. Later, before step_MOM_dynamics, they don't. In fact the "vanilla" case has updated to T=7 and the rotated case has not updated and has somehow picked up that is_initialized is .false..

@kshedstrom
Copy link
Author

kshedstrom commented Dec 4, 2024

Summary of what I know about all this. Thomas Neumann asked first in the MOM6 forum, then in a MOM6 issue, about open boundaries within the domain. My tests for these were shown to be incomplete. I have managed to put in enough changes to make Dr Neumann happy, but the rotational tests for these things still have problems.

For the rotational tests of OBCs, the model initializes parts of CS%OBC, but simultaneously keeps OBC_in. It is this latter structure which gets passed to MOM_state_initialization. Bits are then copied over to CS%OBC, but not quite all of them in all cases.

  1. The bits of the OBC structure which are saved on restart failed before I made them into pointers and made the pointers point to the CS%OBC bits.
  2. In MOM_barotropic, the updates to all the interior points are fine, but there are non-zero updates to eta at h points just outside the OBC. If the run goes long enough, these points can have negative thicknesses. I have made it look more symmetric.
  3. Fix the land masking for the OBCs when a masking depth is used.
  4. Changed tr_reg to tr_Reg to match the rest.
  5. Some experimentation with the rotate_OBC subroutines. Dr Neumann's test uses boundary data of the value=const type. Copying the buffer_dst from OBC_in to CS%OBC gets some of these across, also the tracer reservoir values. However, the tracer reservoir values get overwritten by an interior tracer value between the first call to step_MOM_dynamics and the second.
  6. Similar to the barotropic issue with eta outside, the tracer advection updates the tracer values outside. The flux values being used at the boundary cause these outside values to go out of bounds for the equation of state.
  7. This could use a lot more testing, but formerly N_S boundaries become E_W with a flow of the opposite sign.

@kshedstrom
Copy link
Author

I could be done with this for now. It's working without rotations as far as I know.

@Hallberg-NOAA
Copy link
Member

This PR includes numerous commits that have nothing to do with it. Please rebase this atop the latest version of dev/gfdl, which should hopefully eliminate some of the excess commits, most of which have already been merged into dev/gfdl.

 - Otherwise, the tracer values just outside the OBC get updated
 based on fluxes at the OBC and quickly go out of bounds of the
 equation of state.
 - The previous version did the wrong thing at northern boundaries,
 at a southern corner too.
 - It hasn't yet caused a blowup that I know of, but better to
   prevent any trouble while we're thinking about it.
@kshedstrom kshedstrom force-pushed the fix_obc_maskingdepth branch from ee42fdb to ec17094 Compare January 26, 2025 04:25
@@ -2448,17 +2449,69 @@ subroutine btstep(U_in, V_in, eta_in, dt, bc_accel_u, bc_accel_v, forces, pbce,
haloshift=iev-ie, unscale=US%L_to_m**2*GV%H_to_m)
endif

do j=jsv,jev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where this is going, but I have 4 specific problems with the changes to MOM_barotropic.F90:

  1. This code setting factor does not change from one barotropic timestep to the next, and it might not even change between calls to btstep if the OBC segments don't change. It should be moved before the barotropic time-stepping loop that starts at line 1815, and perhaps it should be made into an element of barotropic_CS and set in barotropic_init().
  2. Why isn't there a similar change from using CS%IareaT to factor where eta_pred is calculated on lines 1928, 1942 or 1948?
  3. factor is not a very specific name that it does not reflect what this array does. Something like IareaT_OBCmask would reflect what this array contains, even if it is somewhat less pithy than I would like.
  4. The comment describing factor does not indicate that its units are [L-2 ~> m-2].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have some good points about the factor code. The BT_OBC structure is set up for each call to btstep, so if we don't fix that, it will be hard to move anything to barotropic_init().

@@ -2897,6 +2897,26 @@ subroutine initialize_MOM(Time, Time_init, param_file, dirs, CS, &
! reservoirs are used.
call open_boundary_register_restarts(HI, GV, US, CS%OBC, CS%tracer_Reg, &
param_file, restart_CSp, use_temperature)
if (turns /= 0) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting segmentation faults in the TC testing from code related to the rotated tracer reservoirs, which means that we can not accept this PR because all future PRs will also fail testing.

I suspect that the problem might be that when there are is rotation by 1 or 3 turns, we should have OBC_in%tres_x corresponding to CS%OBC%tres_y and vice versa. However, when there is rotation, these are (grid-space) arrays with different orientations and sizes so one can not simply point to the other. There has to be an explicit copy instead. These are scalars, so the sign will not change.

The specific problem that is leading to a segmentation fault might be something as simple as the fact that the pointers are just not nullified when they are declared on line 372 of MOM_open_boundary.F90, or it could be that by setting a pointer to point to another pointer, the associated test appears to be true, even if the pointer that is pointed to is not itself pointing to allocated data.

The same issues would be true of OBC_in%rx_oblique_u and the other fields, but these are only used for restarts, and I am pretty sure that these fields are never used from within OBC_in, so they might not even matter.

Also, although the OBC type is not opaque, I think that we should treat it as though it were to the extent possible, so if it does still prove necessary to retain this code, I think that this whole block of code should be moved into a subroutine in MOM_open_boundary.F90 that copies these fields from one ocean_OBC_type to another.


! Only doing this here for 1 turns, get flow directed in the opposite direction.
! Currents that are now E_W were N_S and the turns should change their sign.
if (turns == 1 .and. segment%is_E_or_W) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that perhaps the logic here should be:

qturns = modulo(turns,4)
if ( (qturns == 2) .or. (qturns == 1 .and. segment%is_E_or_W) .or. (qturns == 3 .and. segment%is_N_or_S) ) then

Copy link
Member

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for doing the rebase, @kshedstrom . That makes it so much easier to see what is going on with this PR!

I have added some questions and thoughts about what might be going on here and some things to try that might fix our testing failures, if you are feeling up to it.

Another option here might be to break up this PR into the first few commits that we can accept without controversy (namely those related to setting MASKING_DEPTH for the OBCs and masking out the tracer-point changes in the barotropic solver and tracer advection), and then circling back to sort out what is happening with the rotation later. (The commit changing the allocatable arrays into pointers seems to me like the primary suspect for the TC testing failures, based on the error messages we are getting with our TC testing failures. To see the errors, click on the red-x's that we are getting at #752.)

@kshedstrom
Copy link
Author

Good idea to try without the rotational fixes for now.

@kshedstrom kshedstrom mentioned this pull request Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants