Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nourishment functions #20

Merged
merged 14 commits into from
Jan 18, 2021
Merged

Add nourishment functions #20

merged 14 commits into from
Jan 18, 2021

Conversation

wrightky
Copy link
Collaborator

@wrightky wrightky commented Nov 9, 2020

PR introduces three main feature additions:

  • nourishment_area: Adds two functions for the computation and plotting of the "nourishment area" of a seed injection point, i.e. the region of the domain which receives material from the injection point. Following the convention set by the exposure_time functions, this includes: (A) particle_track.nourishment_area(), which loops through a walk_data, and returns a raster indicating how many occasions each cell was occupied by particles, normalized to the range 0-1. Because these paths are single realizations and inherently spatially-noisey, this function also allows you to smooth the output with a gaussian filter controlled by the variable sigma. (B) routines.show_nourishment_area(), which provides an easy aesthetically-pleasing way to plot that output overtop one of the existing grids. The visit frequency raster is shown as a heatmap, in which alphas fade to zero near the tail of the colormap, indicating areas sparsely visited by particles. Users can control most of the features of the plot using the function inputs, which I've tested on WLD and DeltaRCM outputs and looks pretty spiffy in both applications. Examples shown below

  • nourishment_time: Similarly adds two functions for the computation and plotting of the "nourishment time" of a seed injection point, i.e. the amount of time particles tend to spend in each area of the domain. In a way, this function is the time-equivalent of the above function: once particles get to a cell, how long on average do they stay there? Similarly broken up into computation (particle_track.nourishment_time()) and plotting (routines.show_nourishment_time()), with both functions working very similarly to above. Furthermore, if you take the product of the nourishment_time and nourishment_area output rasters, you obtain the same thing as if you just added up all of the time particles spent in each cell, which may be interesting in its own right. Example below

  • Native boundary-cropping for unstruct2grid: Adds an optional parameter, boundary, to this function, which will optionally crop the interpolated rasters to a domain boundary using matplotlib.path.Path(). Previously, if your domain wasn't a perfect rectangle, areas outside would still get interpolated to a numeric value, so this just allows people to fix that boundary behavior.

I think the nourishment functions would make for a good new example, which I'd be happy to add to this PR before it's merged, if you agree that would be a worthwhile addition. If not, we can merge whenever things look good.

NourishmentArea

nourish_area_rcm

nourish_time

@wrightky wrightky requested a review from elbeejay November 9, 2020 21:34
Copy link
Member

@elbeejay elbeejay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functionality looks like it adds some pretty useful stuff. A few thoughts:

  • I think the nourishment functions would make for a good new example, which I'd be happy to add to this PR before it's merged, if you agree that would be a worthwhile addition.

    I do agree that this would be a nice addition to have in the example suite. Once created, the .py file should be added to the list of examples we run in our continuous integration in the "Run examples" section of the build.yml file (~line 100) so that in the future we ensure the example isn't broken by changes to it, or the underlying code it uses.

  • I got the impression that changes to the unstruct2grid function were intended to be backwards-compatible. However the failing unit test suggests that something has changed (although it isn't entirely obvious to me what is impacting the unit test). Before merging we need to figure out either what caused the change, and if we are okay with it we can modify the unit test, or the changes to unstruct2grid may need to be implemented in a slightly different way. We also should add a unit test for the new functionality being added to unstruct2grid

  • Similarly, we are currently unit testing all of the functions in particle_track.py, so with the addition of these new functions we should add unit tests to ensure they work as expected.

Comment on lines +1018 to +1021
for jj in list(range(1, len(walk_data['xinds'][ii])-1)):
# Running total of particles in this cell to find average
visit_freq[walk_data['xinds'][ii][jj],
walk_data['yinds'][ii][jj]] += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the calculation for nourishment_area() is embedded within the nourishment_time() function as well. Do you think there is value to combining these functions into a calculate_nourishment() function that has an input parameter "nourishment_type" or something similar that controls whether the visit frequency, mean time, or both are returned by the function? I can imagine wanting both the visit frequencies and mean times and it seems like they could be rolled into a single function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and I think it will be simpler to use/read the code if we keep them as two separate functions. I think they're different enough that their use cases aren't guaranteed to overlap. Plus, I think each function may work best with different degrees of smoothing/clipping, so it's easiest if they're not tied together.

dorado/routines.py Outdated Show resolved Hide resolved
@wrightky
Copy link
Collaborator Author

Alright, I think this is ready for another look when you have a chance. In the latest commits, I have:

  • Added a new example using the nourishment_area and nourishment_time functions, including docs, which also updates the DeltaRCM example data to have discharge
  • Added unit tests for the nourishment_area and nourishment_time functions, as well as one for the new unstruct2grid boundary setting
  • Fixed the error in unstruct2grid that was failing the unit tests
  • Added an additional crop option to unstruct2grid, which can eliminate the all-nan border created when a boundary is specified, resulting in smaller output rasters
  • Moved the scipy imports in particle_track into the preamble for unstruct2grid, as it was the only function using that package and should help reduce the overhead when we call on run_iteration
  • Adopted a new algorithm for get_time_state which is cleaner, and should solve whatever weird issue was happening before.
  • Incremented the version number

On that last point, that seems to be the only line with commits that conflict with the existing commits. Could use some help figuring out how to resolve that, still not very familiar with handling parallel commit histories.

@wrightky wrightky requested a review from elbeejay December 18, 2020 21:59
Copy link
Member

@elbeejay elbeejay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments throughout, mostly minor. Some other comments that I didn't necessarily associate with a specific line are below:

  • It would be helpful to resolve the conflict in __init__.py so we can be sure this PR doesn't break anything

  • Does the nourishment area image need a colorbar? I don't mind either way but was just a thought.

  • Add tests that use the smoothing in the nourishment calculations (sigma > 0)

  • test_nourishment_area() fails when I run the unit tests locally (seems to be producing some nan values in place of some expected 0s?)

  • Do we know if get_time_state() is working as intended? Doesn't have to be resolved by this PR if there is an issue but we should open up an issue if there is a known bug

tests/test_particle_track.py Outdated Show resolved Hide resolved
docs/source/examples/example14.rst Outdated Show resolved Hide resolved
docs/source/examples/example14.rst Outdated Show resolved Hide resolved
dorado/routines.py Outdated Show resolved Hide resolved
dorado/routines.py Outdated Show resolved Hide resolved
dorado/routines.py Outdated Show resolved Hide resolved
@elbeejay
Copy link
Member

On that last point, that seems to be the only line with commits that conflict with the existing commits. Could use some help figuring out how to resolve that, still not very familiar with handling parallel commit histories.

Sorry I missed addressing this point. GitHub has a reasonable conflict resolution built into the website, so I think you can do it in the PR by hitting the "Resolve conflicts" button and working through the GitHub GUI. Especially since the conflict is pretty minor.

Otherwise the other option (I'm familiar with) would be doing it via the command line by fetching the "current" version of the code and pulling it into your branch. Then the git CLI will prompt you to resolve the conflict there (which would only be the single line in that case as well).

Let me know if you want more info or want to go through it together and we can do that sometime.

@wrightky
Copy link
Collaborator Author

wrightky commented Jan 4, 2021

  • Does the nourishment area image need a colorbar? I don't mind either way but was just a thought.

I think probably not, since everything is normalized. Included one for the time function because the nominal values mean something, whereas area is all relative.

  • Add tests that use the smoothing in the nourishment calculations (sigma > 0)

Aside from being exhaustive, do you think there's any utility in doing this? When sigma > 0, it isn't easy to compute what the a priori answer should be without just using this function to compute it, so we'd essentially be forcing this function to give us the answer it already gave us.

  • test_nourishment_area() fails when I run the unit tests locally (seems to be producing some nan values in place of some expected 0s?)

This was a good catch, fixed the assert to handle floats or nan

  • Do we know if get_time_state() is working as intended? Doesn't have to be resolved by this PR if there is an issue but we should open up an issue if there is a known bug

All my local testing of the new version works, there's very little that can go wrong now that we've collapsed the main loops/ifs into one simple numpy search. But I still don't know if this is what was giving Nelson problems

@elbeejay
Copy link
Member

elbeejay commented Jan 5, 2021

  • Does the nourishment area image need a colorbar? I don't mind either way but was just a thought.

I think probably not, since everything is normalized. Included one for the time function because the nominal values mean something, whereas area is all relative.

👍

  • Add tests that use the smoothing in the nourishment calculations (sigma > 0)

Aside from being exhaustive, do you think there's any utility in doing this? When sigma > 0, it isn't easy to compute what the a priori answer should be without just using this function to compute it, so we'd essentially be forcing this function to give us the answer it already gave us.

I think it is worth doing (beyond being exhaustive) for the following 2 reasons:

  1. Avoids any possibility of an errant typo or line-change accidentally disrupting the function or performance of that section of the code that slips by our visual check of a future PR
  2. Will throw an error if one of the functions from an upstream library (scipy.ndimage.gaussian_filter in this case) is changed in a way that breaks this functionality or alters the output; whether that is changing the number or order of expected inputs/outputs or altering the function internals and the subsequent output. Then we'd have a chance to do any/all of the following: 1) pin the scipy version to one that works; 2) update the function to reflect the new scipy syntax; 3) if the result of the function call changes we could add a warning in our code to let users know their result might change depending on the version of scipy they have (admittedly for this function this is unlikely).
  • test_nourishment_area() fails when I run the unit tests locally (seems to be producing some nan values in place of some expected 0s?)

This was a good catch, fixed the assert to handle floats or nan

👍

  • Do we know if get_time_state() is working as intended? Doesn't have to be resolved by this PR if there is an issue but we should open up an issue if there is a known bug

All my local testing of the new version works, there's very little that can go wrong now that we've collapsed the main loops/ifs into one simple numpy search. But I still don't know if this is what was giving Nelson problems

Sounds good.

@wrightky
Copy link
Collaborator Author

Think this might be ready!

@wrightky wrightky marked this pull request as ready for review January 16, 2021 00:05
@wrightky wrightky requested a review from elbeejay January 16, 2021 00:06
Copy link
Member

@elbeejay elbeejay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍

@elbeejay elbeejay merged commit 4ab8cc7 into passaH2O:master Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants