Regression label quality scores #572

krmayankb · 2022-12-23T20:22:57Z

Regression module added.

residual and outre scoring methods added.
Documentation indexes are updated.
Unit tests
Tutorial Notebook
driver code

labels = np.array([1,2,3,4])
pred_labels = np.array([2,2,5,4.1])
print(get_label_quality_score(labels, pred_labels))

codecov · 2022-12-23T21:22:56Z

Codecov Report

Merging #572 (f138ae8) into master (2857333) will decrease coverage by 0.10%.
The diff coverage is 94.69%.

@@            Coverage Diff             @@
##           master     #572      +/-   ##
==========================================
- Coverage   96.31%   96.21%   -0.10%     
==========================================
  Files          56       60       +4     
  Lines        4392     4675     +283     
  Branches      768      809      +41     
==========================================
+ Hits         4230     4498     +268     
- Misses         85       91       +6     
- Partials       77       86       +9

Impacted Files	Coverage Δ
cleanlab/regression/learn.py	`93.21% <93.21%> (ø)`
cleanlab/__init__.py	`100.00% <100.00%> (ø)`
cleanlab/internal/regression_utils.py	`100.00% <100.00%> (ø)`
cleanlab/regression/__init__.py	`100.00% <100.00%> (ø)`
cleanlab/regression/rank.py	`100.00% <100.00%> (ø)`

elisno

This looks great! I have several comments on the source code and tests.

elisno · 2022-12-23T21:32:52Z

cleanlab/internal/regression_utils.py

+
+def assert_valid_inputs(
+    labels: np.ndarray,
+    predictions: np.ndarray,


I'm sure this has been discussed elsewhere, but do you think predictions could get confused with pred_probs?
I don't think y_pred works here, as it doesn't complement the given labels.

I understand why we still use labels for the "given" targets, to be consistent with the rest of the package.

@elisno why would there be pred_probs for regression?

IMO the confusion that is more likely to happen here is that somebody just sees assert_valid_inputs() in some code and doesn't realize it is for regression rather than classification. To mitigate, could consider renaming this function: assert_valid_regression_inputs() but I don't have strong opinions either way

elisno · 2022-12-23T21:39:00Z

cleanlab/internal/regression_utils.py

+
+def assert_valid_inputs(
+    labels: np.ndarray,
+    predictions: np.ndarray,


Note to self:
I believe the type hints should be different here.

pred_probs: npt.NDArray[np.floating] # v.s. prediction: npt.NDArray[np.number]

elisno · 2022-12-23T21:52:36Z

cleanlab/internal/regression_utils.py

+    # Check if labels and pred_labels are np.ndarray
+    if not isinstance(labels, np.ndarray) or not isinstance(predictions, np.ndarray):
+        raise TypeError("labels and pred_labels must be of type np.ndarray")
+
+    # Check if labels and pred_labels are of same shape
+    assert (
+        labels.shape == predictions.shape
+    ), f"shape of label {labels.shape} and predicted labels {predictions.shape} are not same."


While predictions is not the same as pred_probs, note that there are some commonalities between this block and
cleanlab.internal.validation.assert_valid_inputs (v2.2.0 docs source).

No need to address this now, but this block could be put into the cleanlab.internal.validation module in the future?

yes, I agree that there is some similarity between this block and cleanlab.internal.validation.assert_valid_inputs. Currently, this has been created to consider different parameters of regression as these problems may require some arguments that are entirely different from classification. Also, the nomenclature of other arguments might not make sense for the regression task for example allow_one_class has no valid significance in regression.

It would be a good idea to move this to cleanlab.internal.validation in the future.

I've added a TODO in code for this in suggestion above

cleanlab/regression/rank.py

tests/test_regression.py

cleanlab/regression/rank.py

Co-authored-by: Elías Snorrason <eliassno@gmail.com>

1. added typing hints for scoring funcs 2. Removed try-except block for raising value error. 3. grammatical corrections 4. knn and neighbors construction moved closer to first usage. Co-authored-by: Elías Snorrason <eliassno@gmail.com>

docs/source/tutorials/index.rst

docs/source/tutorials/regression.ipynb

jwmueller · 2022-12-24T02:19:40Z

cleanlab/internal/regression_utils.py

+
+    # Check if labels and pred_labels are np.ndarray


Suggested change

# Check if labels and pred_labels are np.ndarray

# TODO: merge common code with cleanlab.internal.validation.assert_valid_inputs

# Check if labels and pred_labels are np.ndarray

cleanlab/internal/regression_utils.py

cleanlab/regression/rank.py

docs/source/tutorials/regression.ipynb

cleanlab/regression/rank.py

docs/source/tutorials/regression.ipynb

jwmueller

LGTM, awesome work!!

I made some further edits so please look through those.
After that, please clear the tutorial outputs.

Note I suggested making some methods private, so consider those suggestions before we merge this!

krmayankb added 11 commits October 11, 2022 11:45

added basic regression ranking

a9282da

minor fixes, docstring modified

0503595

tutorial added, added to docs index pages

0a0c41e

unit tests added

f4a8d17

reindexed tutorial, punctuation fix for docstring

5aee141

plots changed in tutorial notebook

03fbc18

typo fix

29d6080

cleanlab outlier based scoring method added

bf7860e

regression_utils created

9bf8a5f

pred_labels changed to predictions

c399ffc

unit tests for new scoring method

2519550

krmayankb changed the title ~~Regression label quality score~~ Regression label quality scores Dec 23, 2022

krmayankb and others added 5 commits December 23, 2022 12:58

init merge conflict resolved

9d00253

tutorial draft1

1a9409f

tutorial draft1

8705398

merge conflict

c081913

Merge branch 'master' into regression

5a8a22d

elisno reviewed Dec 23, 2022

View reviewed changes

cleanlab/regression/rank.py Outdated Show resolved Hide resolved

krmayankb and others added 4 commits December 23, 2022 14:56

default modified for method in docstring

d80e077

grammatical correction in rank.py

02defb9

Co-authored-by: Elías Snorrason <eliassno@gmail.com>

Update cleanlab/regression/rank.py

4a0a9ef

Co-authored-by: Elías Snorrason <eliassno@gmail.com>

rank.py updates

f2c5862

1. added typing hints for scoring funcs 2. Removed try-except block for raising value error. 3. grammatical corrections 4. knn and neighbors construction moved closer to first usage. Co-authored-by: Elías Snorrason <eliassno@gmail.com>

jwmueller reviewed Dec 24, 2022

View reviewed changes

docs/source/tutorials/index.rst Show resolved Hide resolved

jwmueller reviewed Dec 24, 2022

View reviewed changes

docs/source/tutorials/regression.ipynb Show resolved Hide resolved

jwmueller reviewed Dec 24, 2022

View reviewed changes

cleanlab/internal/regression_utils.py Outdated Show resolved Hide resolved

jwmueller reviewed Dec 24, 2022

View reviewed changes

cleanlab/internal/regression_utils.py Show resolved Hide resolved

jwmueller reviewed Dec 24, 2022

View reviewed changes

cleanlab/regression/rank.py Outdated Show resolved Hide resolved