Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge to Rearrange-Reader: Add check if null should be replaced with nan #368

Conversation

dcslagel
Copy link
Collaborator

@dcslagel dcslagel commented Jul 4, 2020

  • fixes tests/test_null_policy.py::test_null_policy_NULL_none

While I'm pretty sure this is a good place in the code to fix this, I'm not fully confident that version_NULL is the right check for arr[arr == provisional_null] = np.nan. It might be NULL_POLICIES or NULL_SUBS that needs to be the check variable. Could you check it and advise?

Let me know if this change could be accepted (or rejected) or
needs some additional changes before being approved and merged.

Thank you,
DC

- fixes tests/test_null_policy.py::test_null_policy_NULL_none
@kinverarity1
Copy link
Owner

Good question! Your PR is correct.

For reference, I think this is what the process is - or should be - at the moment:

  1. The user tells lasio with the null_policy keyword argument how and where they want NaN to appear in the end result. This may include telling lasio to ignore the NULL. value from the ~V section and instead retain that value.
  2. lasio converts the null_policy to a list of regexps and values to find and convert to NaN when parsing the data section from file to memory. (Note that critically this happens during parsing, not after parsing. This is something that prevents a simple switch-over to using pandas.read_csv as a reader.) The conversion to a list of values occurs at reader.get_substitutions. This function also returns the boolean value version_NULL to indicate if the value indicated by the LAS file itself (NULL.) should also be replaced with NaN. At this stage, we don't know what NULL. value, or even that the LAS file defines it. That's the reason for provisional_null existing. It represents NULL..
  3. When parsing values in reader.get_data_section_iterative, the regexps and values are applied and a numpy array is returned which may or may not contain NaNs already. It may also contain values which have not yet been identified as needing conversion to NaNs, because we might not have read the NULL. value yet.
  4. If version_NULL is true, the final step is to convert NULL. (provisional_null) to NaN. Your PR fixes this step.

Thanks for cleaning up the broken tests.

@kinverarity1 kinverarity1 merged commit 6da4a11 into kinverarity1:rearrange-reader Jul 5, 2020
@dcslagel dcslagel deleted the rr-fix-null-policy-test-draft branch July 5, 2020 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants