-
Notifications
You must be signed in to change notification settings - Fork 792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression label quality scores #572
Conversation
Codecov Report
@@ Coverage Diff @@
## master #572 +/- ##
==========================================
- Coverage 96.31% 96.21% -0.10%
==========================================
Files 56 60 +4
Lines 4392 4675 +283
Branches 768 809 +41
==========================================
+ Hits 4230 4498 +268
- Misses 85 91 +6
- Partials 77 86 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have several comments on the source code and tests.
|
||
def assert_valid_inputs( | ||
labels: np.ndarray, | ||
predictions: np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure this has been discussed elsewhere, but do you think predictions
could get confused with pred_probs
?
I don't think y_pred
works here, as it doesn't complement the given labels
.
I understand why we still use labels
for the "given" targets, to be consistent with the rest of the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elisno why would there be pred_probs
for regression?
IMO the confusion that is more likely to happen here is that somebody just sees assert_valid_inputs()
in some code and doesn't realize it is for regression rather than classification. To mitigate, could consider renaming this function: assert_valid_regression_inputs()
but I don't have strong opinions either way
|
||
def assert_valid_inputs( | ||
labels: np.ndarray, | ||
predictions: np.ndarray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self:
I believe the type hints should be different here.
pred_probs: npt.NDArray[np.floating]
# v.s.
prediction: npt.NDArray[np.number]
# Check if labels and pred_labels are np.ndarray | ||
if not isinstance(labels, np.ndarray) or not isinstance(predictions, np.ndarray): | ||
raise TypeError("labels and pred_labels must be of type np.ndarray") | ||
|
||
# Check if labels and pred_labels are of same shape | ||
assert ( | ||
labels.shape == predictions.shape | ||
), f"shape of label {labels.shape} and predicted labels {predictions.shape} are not same." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While predictions
is not the same as pred_probs
, note that there are some commonalities between this block and
cleanlab.internal.validation.assert_valid_inputs
(v2.2.0 docs source).
No need to address this now, but this block could be put into the cleanlab.internal.validation
module in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree that there is some similarity between this block and cleanlab.internal.validation.assert_valid_inputs
. Currently, this has been created to consider different parameters of regression as these problems may require some arguments that are entirely different from classification. Also, the nomenclature of other arguments might not make sense for the regression task for example allow_one_class
has no valid significance in regression.
It would be a good idea to move this to cleanlab.internal.validation
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a TODO in code for this in suggestion above
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
1. added typing hints for scoring funcs 2. Removed try-except block for raising value error. 3. grammatical corrections 4. knn and neighbors construction moved closer to first usage. Co-authored-by: Elías Snorrason <eliassno@gmail.com>
|
||
# Check if labels and pred_labels are np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Check if labels and pred_labels are np.ndarray | |
# TODO: merge common code with cleanlab.internal.validation.assert_valid_inputs | |
# Check if labels and pred_labels are np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, awesome work!!
I made some further edits so please look through those.
After that, please clear the tutorial outputs.
Note I suggested making some methods private, so consider those suggestions before we merge this!
Regression module added.
driver code