Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplifyFragmentData - unusual aggregation behavior #70

Closed
brownag opened this issue Aug 17, 2018 · 6 comments
Closed

simplifyFragmentData - unusual aggregation behavior #70

brownag opened this issue Aug 17, 2018 · 6 comments

Comments

@brownag
Copy link
Member

brownag commented Aug 17, 2018

On Julie's machine, rock fragment volume sums within classes returned by simplifyFragmentData() were not correct.

Stepping through the code... the data in NASIS were correctly populated, and the code to sort fragment records into size classes and sum within classes is working correctly.

The problem arises when going from long->wide (1 record per horizon with frags) via the reshape2::dcast() call.

rf.wide <- reshape2::dcast(rf.sums, fm, value.var = "volume", drop = FALSE)

A warning printed to the console about the default aggregation (length) function being used led me to try using fun.aggregate=sum in the dcast() call, which solved the problem. However, presumably, the sums have already been calculated via:

rf.sums <- ddply(rf.classes, c(id.var, "class"), plyr::summarise, volume = sum(fragvol, na.rm = TRUE))

Has something changed in reshape2? Has the default aggregation always been length??
I have been unable to replicate the error condition on my machine. Instead of correct volumetric sums, integer values between 1 and 5 were being returned after conversion from long->wide format. Could these numbers be the lengths (number of records contributing) to the "sum" for each class rather than the sum itself?

@brownag
Copy link
Member Author

brownag commented Aug 31, 2018

This issue cropped up on my machine this week. I was in the middle of something so I just fixed it by installing the "agb" branch with the above mentioned fix.

Is anyone else having this issue?

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Sep 4, 2018

Thanks for reporting this. The default behavior of reshape2::dcast has always bothered me: there are few cases where a warning + change in behavior is a good idea. Do you have some examples that I could check? I'll use this as the basis for the first soilDB test suite element (884a234).

I suspect that the problem may be related to duplication with respect to id and class, or something I have overlooked. Again, an example pedon ID would be really helpful.

@brownag
Copy link
Member Author

brownag commented Sep 4, 2018

It is not clear which pedons are causing this issue, but you are correct that it seems to be some sort of an issue with how the pedon is specified or how the records are uniquely identified.

If you query all NASIS pedons in CA630 using fetchNASIS_pedons() you will notice that it defaults to length as the aggregation function which causes the total_frags_pct and total_frags_pct_nopf to calculate incorrectly.

@brownag
Copy link
Member Author

brownag commented Sep 4, 2018

If I had a smaller set of pedons that I could reproduce the error with, I would. But the smaller selected sets I have been working on are summing correctly.

dylanbeaudette added a commit that referenced this issue Sep 4, 2018
@dylanbeaudette
Copy link
Member

There were several issues that may have been contributing to the unusual and hard to replicate data:

  • NA in the fragment volume
  • NA in all of the fragment size (RV, and high)
  • bug in .sieve() by which 'para' was being added to 'NA' to create a bogus fragment size class

These were all accounted for in e2b237d.

This commit comes with a slight change in behavior: records with NA fragvol are removed before simplifyFragmentData() does anything. A warning is generated if this happens.

This commit addresses an issue in #43: simplfyFragmentData has been renamed to simplifyFragmentData, with a stub pointing to the old function name for backwards compatibility.

@dylanbeaudette
Copy link
Member

After some more testing, I no longer see the warning message from dcast. I think we can close this issue now.

Reminder to future self: don't trust dcast to always do the right thing, consider an aggregation function that throws an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants