Description
Hi Kyle,
This is Jenna from A2-AI. We’re using yspec more and more in our day-to-day workflow for quality control and really appreciate how well designed the package is! We have identified a gap in functionality to ensure that column entries are indeed within specified ranges. We were wondering if it would be possible to expand the capabilities of ys_document() and ys_check() to include discontinuous ranges, i.e. ranges that union discrete values. For example we might have a value of -999 as a flag in a column that otherwise had some known range, and we'd like to include this isolated value in yspec ranges.
Consider the following Test.yml and its ys_document() output:
AMT:
range: [[0, 100], -999]
library(yspec)
yspec::ys_document("Test.yml")
Then Details
might say: numeric; valid range: [0 to 100] or {-999}
Suppose instead that we’d like to specify multiple flags:
AMT:
range: [[0, 100], -999, -99]
In which case, Details
might say numeric; valid range: [0 to 100] or {-999, -99}
Alternatively, we’ve considered a flag specifier item in the case of multiple flags.
Consider the following yaml:
AMT:
range: [0, 100]
flags: {"missing": -999, "outlier": -888}
This doesn’t render when ys_document() is called because flags
isn't a defined item. If it did, the output could appear as key-value pairs in the same way set values do.
VARIABLE | LABEL | DETAILS | FLAGS |
---|---|---|---|
AMT | AMT | numeric; valid range: [0 to 100] | -999 = missing, -888 = outlier |
or
VARIABLE | LABEL | DETAILS |
---|---|---|
AMT | AMT | numeric; valid range: [0 to 100] flags: -999 = missing, -888 = outlier |
Additionally, these flags could be included as valid values in range checks for ys_check().
Let us know know what you think and if this is something we could help collaborate on implementing. We’re also curious to know if there are other approaches Metrum may be using at this point to address these challenges in some other way?
Thanks,
Jenna
cc: @dpastoor