Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression label quality scores #572

Merged
merged 127 commits into from
Jun 24, 2023
Merged

Conversation

krmayankb
Copy link
Contributor

@krmayankb krmayankb commented Dec 23, 2022

Regression module added.

  1. residual and outre scoring methods added.
  2. Documentation indexes are updated.
  3. Unit tests
  4. Tutorial Notebook
    driver code
labels = np.array([1,2,3,4])
pred_labels = np.array([2,2,5,4.1])
print(get_label_quality_score(labels, pred_labels))

@krmayankb krmayankb changed the title Regression label quality score Regression label quality scores Dec 23, 2022
@codecov
Copy link

codecov bot commented Dec 23, 2022

Codecov Report

Merging #572 (f138ae8) into master (2857333) will decrease coverage by 0.10%.
The diff coverage is 94.69%.

@@            Coverage Diff             @@
##           master     #572      +/-   ##
==========================================
- Coverage   96.31%   96.21%   -0.10%     
==========================================
  Files          56       60       +4     
  Lines        4392     4675     +283     
  Branches      768      809      +41     
==========================================
+ Hits         4230     4498     +268     
- Misses         85       91       +6     
- Partials       77       86       +9     
Impacted Files Coverage Δ
cleanlab/regression/learn.py 93.21% <93.21%> (ø)
cleanlab/__init__.py 100.00% <100.00%> (ø)
cleanlab/internal/regression_utils.py 100.00% <100.00%> (ø)
cleanlab/regression/__init__.py 100.00% <100.00%> (ø)
cleanlab/regression/rank.py 100.00% <100.00%> (ø)

Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I have several comments on the source code and tests.


def assert_valid_inputs(
labels: np.ndarray,
predictions: np.ndarray,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this has been discussed elsewhere, but do you think predictions could get confused with pred_probs?
I don't think y_pred works here, as it doesn't complement the given labels.

I understand why we still use labels for the "given" targets, to be consistent with the rest of the package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elisno why would there be pred_probs for regression?

IMO the confusion that is more likely to happen here is that somebody just sees assert_valid_inputs() in some code and doesn't realize it is for regression rather than classification. To mitigate, could consider renaming this function: assert_valid_regression_inputs() but I don't have strong opinions either way


def assert_valid_inputs(
labels: np.ndarray,
predictions: np.ndarray,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self:
I believe the type hints should be different here.

pred_probs: npt.NDArray[np.floating]
# v.s.
prediction: npt.NDArray[np.number]

Comment on lines 15 to 22
# Check if labels and pred_labels are np.ndarray
if not isinstance(labels, np.ndarray) or not isinstance(predictions, np.ndarray):
raise TypeError("labels and pred_labels must be of type np.ndarray")

# Check if labels and pred_labels are of same shape
assert (
labels.shape == predictions.shape
), f"shape of label {labels.shape} and predicted labels {predictions.shape} are not same."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While predictions is not the same as pred_probs, note that there are some commonalities between this block and
cleanlab.internal.validation.assert_valid_inputs (v2.2.0 docs source).

No need to address this now, but this block could be put into the cleanlab.internal.validation module in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree that there is some similarity between this block and cleanlab.internal.validation.assert_valid_inputs. Currently, this has been created to consider different parameters of regression as these problems may require some arguments that are entirely different from classification. Also, the nomenclature of other arguments might not make sense for the regression task for example allow_one_class has no valid significance in regression.

It would be a good idea to move this to cleanlab.internal.validation in the future.

Copy link
Member

@jwmueller jwmueller Dec 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO in code for this in suggestion above

cleanlab/regression/rank.py Outdated Show resolved Hide resolved
cleanlab/regression/rank.py Outdated Show resolved Hide resolved
cleanlab/regression/rank.py Show resolved Hide resolved
tests/test_regression.py Show resolved Hide resolved
tests/test_regression.py Outdated Show resolved Hide resolved
tests/test_regression.py Outdated Show resolved Hide resolved
tests/test_regression.py Show resolved Hide resolved
cleanlab/regression/rank.py Outdated Show resolved Hide resolved
krmayankb and others added 4 commits December 23, 2022 14:56
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
1. added typing hints for scoring funcs
2. Removed try-except block for raising value error.
3. grammatical corrections
4. knn and neighbors construction moved closer to first usage.
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Comment on lines 14 to 15

# Check if labels and pred_labels are np.ndarray
Copy link
Member

@jwmueller jwmueller Dec 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Check if labels and pred_labels are np.ndarray
# TODO: merge common code with cleanlab.internal.validation.assert_valid_inputs
# Check if labels and pred_labels are np.ndarray

Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome work!!

I made some further edits so please look through those.
After that, please clear the tutorial outputs.

Note I suggested making some methods private, so consider those suggestions before we merge this!

@jwmueller jwmueller merged commit 9f30220 into cleanlab:master Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants