Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220

gdicker1 · 2024-07-30T21:18:47Z

This PR enables GPU calculation of post-acoustic step variable reconstitution by adding OpenACC directives to the atm_recover_large_step_variables_work routine.

Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer: atm_recover_large_step_variables [ACC_data_xfer]

Invariant fields used in the work routine are copied in during mpas_atm_dynamics_init and deleted in mpas_atm_dynamics_finalize by building off the fields already handled in previous PRs.

gdicker1 · 2024-07-30T21:25:07Z

NOTE: the changes in this PR slightly change answers. The changes seem to come from a couple of loops. In my regional test case, if the calculations of rho_p, rho_zz, rtheta_p, exner, and pressure_p are done on the CPU instead, no differences occur.

First this loop

         do k = 1, nVertLevels
            rho_p(k,iCell) = rho_p_save(k,iCell) + rho_pp(k,iCell)
            rho_zz(k,iCell) = rho_p(k,iCell) + rho_base(k,iCell)
         end do

And in this loop

            do k = 1, nVertLevels
               rtheta_p(k,iCell) = rtheta_p_save(k,iCell) + rtheta_pp(k,iCell) &
                                 - dt * rho_zz(k,iCell) * rt_diabatic_tend(k,iCell)
               theta_m(k,iCell) = (rtheta_p(k,iCell) + rtheta_base(k,iCell))/rho_zz(k,iCell)
               exner(k,iCell) = (zz(k,iCell)*(rgas/p0)*(rtheta_p(k,iCell)+rtheta_base(k,iCell)))**rcv
               ! pressure_p is perturbation pressure
               pressure_p(k,iCell) = zz(k,iCell) * rgas * (exner(k,iCell)*rtheta_p(k,iCell)+rtheta_base(k,iCell)  &
                                                          * (exner(k,iCell)-exner_base(k,iCell)))
            end do

mgduda · 2024-08-02T00:04:21Z

Although there isn't anything in particular that looks suspect to me, it does seem like the differences in results after the first timestep of a 12-km regional simulation comparing with the current develop branch are a bit larger than I would have expected (based on the differences we see before and after the merge of #1216).

I also get a core dump at the end of the simulation on Derecho when using the following modules:

module --force purge
ml ncarenv/23.09    
ml craype/2.7.23    
ml nvhpc/24.3
ml ncarcompilers/1.0.0
ml cray-mpich/8.1.27
ml parallel-netcdf/1.12.3
ml cuda/12.2.1

It may be worth taking a second look at the changes in this PR to see if we can find anything that's subtly off.

gdicker1 · 2024-08-29T21:47:47Z

@mgduda, this is ready for another round of review!

There are slight answer differences, but isolated just to exner and pressure_p from what I can tell. No answer differences occur if the code below is added after the loop to calculate rtheta_p, theta_m, exner, and pressure_p (near line 2892 in this PR).

         ! Calculate exner and pressure_p on CPU
         !$acc update host(rtheta_p)
         do iCell=cellStart,cellEnd
            do k = 1, nVertLevels
               exner(k,iCell) = (zz(k,iCell)*(rgas/p0)*(rtheta_p(k,iCell)+rtheta_base(k,iCell)))**rcv
               ! pressure_p is perturbation pressure
               pressure_p(k,iCell) = zz(k,iCell) * rgas * (exner(k,iCell)*rtheta_p(k,iCell)+rtheta_base(k,iCell)  &
                                                          * (exner(k,iCell)-exner_base(k,iCell)))
            end do
         end do
         !$acc update device(exner,pressure_p)

mgduda · 2024-12-11T00:38:10Z

@gdicker1 If I use the current HEAD of the develop branch (ce4d670) as a baseline, I find that in addition to the code you mention above to compute exner and pressure_p on the host, I also need to add

         rw(1,iCell) = 0.0

and

         rw(nVertLevels+1,iCell) = 0.0

around lines 2860 and 2873 (in this PR branch) in order to get identical results with the baseline.
It makes sense to me that this might be needed, since we create rather than copyin the rw field; but what surprises me is that this isn't needed in the PR branch (if we use 63fdcf7 as the baseline, and don't merge onto the HEAD of the develop branch). Perhaps we are just lucky in getting some zeros in the memory that's given to us on the device?

If you agree that we should be initializing the rw field at the first and last level, can you add those changes to the PR branch?

gdicker1 · 2024-12-12T22:16:42Z

@gdicker1 If I use the current HEAD of the develop branch (ce4d670) as a baseline, I find that in addition to the code you mention above to compute exner and pressure_p on the host, I also need to add ...

I agree with those changes, and will get them pushed. I would assume we were just getting lucky before this point!

gdicker1 · 2024-12-30T18:09:21Z

@mgduda this is ready for re-review. Apologies for not pinging after I pushed cde6447.

src/core_atmosphere/dynamics/mpas_atm_time_integration.F

gdicker1 · 2025-01-14T17:49:57Z

@mgduda, ready for another review!

mgduda · 2025-01-15T00:48:45Z

@gdicker1 Even after applying the patch to compute exner and pressure_p on the host, I'm seeing answer differences with the changes in this PR. Can you verify that you are able to get bitwise-identical results when correcting the computation of just exner and pressure_p?

mgduda · 2025-01-16T16:38:27Z

@gdicker1 It looks like there's a possible bug in your commit f631856: wwAvg and ruAvg are now in a delete clause rather than a copyout clause.

gdicker1 · 2025-01-16T18:20:06Z

Force-push cc7e7b7 to adfdc2f to rebase this branch on develop. (Currently 4ea6abb, the merge of #1223 onto develop).

gdicker1 · 2025-01-16T18:33:51Z

This should be ready for another review after pushing fixup c29fcb5.

@mgduda, does this pass as bitwise identical with this output from compare_netcdf.py?

Comparing files: f1 - f2
        f1=baseline_acc_1736985688/restart.2019-09-01_00.06.00.nc
        f2=merge-dev-1223_1737050011/restart.2019-09-01_00.06.00.nc

            Variable  Min       Max       
=========================================
mminlu is not a numeric field and will not be compared
initial_time is not a numeric field and will not be compared
xtime is not a numeric field and will not be compared
              relhum -10.240860  106.321869
        refl10cm_max -35.000000  2.681815

Above, the baseline results compared with this PR with:

The PR branch modified so the calculations for exner and pressure_p are conducted on the host
Both the baseline and test compiled with OPENACC=true

gdicker1 · 2025-01-16T21:45:34Z

@mgduda, does this pass as bitwise identical with this output from compare_netcdf.py?
...

I found the difference, I have different output requests in my streams.atmosphere for the baseline and the test run. This means some diagnostics are computed on different schedules. I'll ping again once I've made sure the runs are exactly the same.

gdicker1 · 2025-01-16T22:09:33Z

@mgduda, using the same streams.atmosphere settings, I now get the same results. This is ready for review!

src/core_atmosphere/dynamics/mpas_atm_time_integration.F

These changes enable the GPU execution of the atm_recover_large_step_variables_work subroutine by adding OpenACC directives. In order to factor out the time to transfer data between CPU and GPU within this routine, the new timer 'atm_recover_large_step_variables [ACC_data_xfer]' has been added to the log file. Changes include: - Preparing the routine for porting. Modifying whitespace to make regions clear, changing implicit loop assignments to be explicit, and fusing some loops. - Adding OpenACC parallel and loop directives to the do-loops. - Managing the invariant fields needed for this routine in mpas_atm_dynamics_{init,finalize} so they are available across timesteps. - Managing the other fields needed in the routine with OpenACC directives and using default(present) to ensure data isn't missed. default(present) clauses cause a run-time error if data isn't present and will help as we fuse data regions.

gdicker1 · 2025-01-17T23:13:20Z

Force-push baa8b40 to be338c9 squashes this into one commit.

mgduda self-requested a review July 31, 2024 21:29

mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Jul 31, 2024

gdicker1 force-pushed the atmosphere/acc_recover_large_step branch from 168d6e3 to 4a923ad Compare August 29, 2024 21:42

mgduda reviewed Jan 14, 2025

View reviewed changes

src/core_atmosphere/dynamics/mpas_atm_time_integration.F Outdated Show resolved Hide resolved

mgduda reviewed Jan 14, 2025

View reviewed changes

src/core_atmosphere/dynamics/mpas_atm_time_integration.F Outdated Show resolved Hide resolved

mgduda self-requested a review January 14, 2025 23:45

gdicker1 force-pushed the atmosphere/acc_recover_large_step branch from cc7e7b7 to adfdc2f Compare January 16, 2025 18:18

mgduda requested changes Jan 17, 2025

View reviewed changes

src/core_atmosphere/dynamics/mpas_atm_time_integration.F Outdated Show resolved Hide resolved

gdicker1 force-pushed the atmosphere/acc_recover_large_step branch from baa8b40 to be338c9 Compare January 17, 2025 23:12

mgduda self-requested a review January 18, 2025 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220

Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220

gdicker1 commented Jul 30, 2024

gdicker1 commented Jul 30, 2024

mgduda commented Aug 2, 2024

gdicker1 commented Aug 29, 2024

mgduda commented Dec 11, 2024

gdicker1 commented Dec 12, 2024

gdicker1 commented Dec 30, 2024

gdicker1 commented Jan 14, 2025

mgduda commented Jan 15, 2025

mgduda commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 17, 2025

Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220

Are you sure you want to change the base?

Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220

Conversation

gdicker1 commented Jul 30, 2024

gdicker1 commented Jul 30, 2024

mgduda commented Aug 2, 2024

gdicker1 commented Aug 29, 2024

mgduda commented Dec 11, 2024

gdicker1 commented Dec 12, 2024

gdicker1 commented Dec 30, 2024

gdicker1 commented Jan 14, 2025

mgduda commented Jan 15, 2025

mgduda commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 16, 2025

gdicker1 commented Jan 17, 2025