-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220
base: develop
Are you sure you want to change the base?
Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220
Conversation
NOTE: the changes in this PR slightly change answers. The changes seem to come from a couple of loops. In my regional test case, if the calculations of First this loop
And in this loop
|
Although there isn't anything in particular that looks suspect to me, it does seem like the differences in results after the first timestep of a 12-km regional simulation comparing with the current I also get a core dump at the end of the simulation on Derecho when using the following modules:
It may be worth taking a second look at the changes in this PR to see if we can find anything that's subtly off. |
168d6e3
to
4a923ad
Compare
@mgduda, this is ready for another round of review! There are slight answer differences, but isolated just to
|
@gdicker1 If I use the current HEAD of the
and
around lines 2860 and 2873 (in this PR branch) in order to get identical results with the baseline. If you agree that we should be initializing the |
I agree with those changes, and will get them pushed. I would assume we were just getting lucky before this point! |
@mgduda, ready for another review! |
@gdicker1 Even after applying the patch to compute |
cc7e7b7
to
adfdc2f
Compare
This should be ready for another review after pushing fixup c29fcb5. @mgduda, does this pass as bitwise identical with this output from
Above, the baseline results compared with this PR with:
|
I found the difference, I have different output requests in my streams.atmosphere for the baseline and the test run. This means some diagnostics are computed on different schedules. I'll ping again once I've made sure the runs are exactly the same. |
@mgduda, using the same streams.atmosphere settings, I now get the same results. This is ready for review! |
These changes enable the GPU execution of the atm_recover_large_step_variables_work subroutine by adding OpenACC directives. In order to factor out the time to transfer data between CPU and GPU within this routine, the new timer 'atm_recover_large_step_variables [ACC_data_xfer]' has been added to the log file. Changes include: - Preparing the routine for porting. Modifying whitespace to make regions clear, changing implicit loop assignments to be explicit, and fusing some loops. - Adding OpenACC parallel and loop directives to the do-loops. - Managing the invariant fields needed for this routine in mpas_atm_dynamics_{init,finalize} so they are available across timesteps. - Managing the other fields needed in the routine with OpenACC directives and using default(present) to ensure data isn't missed. default(present) clauses cause a run-time error if data isn't present and will help as we fuse data regions.
baa8b40
to
be338c9
Compare
This PR enables GPU calculation of post-acoustic step variable reconstitution by adding OpenACC directives to the
atm_recover_large_step_variables_work
routine.Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer:
atm_recover_large_step_variables [ACC_data_xfer]
Invariant fields used in the work routine are copied in during
mpas_atm_dynamics_init
and deleted inmpas_atm_dynamics_finalize
by building off the fields already handled in previous PRs.