Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datalab_regression #796

Merged
merged 121 commits into from
Nov 20, 2023
Merged

datalab_regression #796

merged 121 commits into from
Nov 20, 2023

Conversation

mglowacki100
Copy link
Contributor

@mglowacki100 mglowacki100 commented Aug 7, 2023

Based on discussion #774
Preleminary work to ensure that default task (classification) will work smoothly. Still to do: label.py for regression, test for Datalab regression, docstring updates.

Usage example

from cleanlab import Datalab
import numpy as np

# Generate some toy-examples
N = 200
X = np.random.rand(N, 3)
y = 1 + 2 * X[:,0] + 3 * X[:,1] + 4 * X[:,2] + 0.05*np.random.randn(N)

# Run defaults
lab = Datalab(data={"y": y}, label_name='y', task="regression")
lab.find_issues(features=X)

# Run with any scikit-learn compatible model
from sklearn.neighbors import KNeighborsRegressor
lab = Datalab(data={"y": y}, label_name='y', task="regression")
model = KNeighborsRegressor(n_neighbors=3)
lab.find_issues(features=X, issue_types={"label": {"clean_learning_kwargs": {"model": model}}})

Preleminary work to ensure that default task  (classification) will work smoothly.
@CLAassistant
Copy link

CLAassistant commented Aug 7, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start @mglowacki100!

Here are a few comments.

cleanlab/datalab/internal/issue_manager_factory.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/issue_manager_factory.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/issue_manager_factory.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/issue_manager_factory.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/issue_manager_factory.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/report.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/issue_finder.py Outdated Show resolved Hide resolved
@elisno
Copy link
Member

elisno commented Aug 7, 2023

You may want to merge master into your branch before moving on with development.

cleanlab/datalab/datalab.py Outdated Show resolved Hide resolved
cleanlab/datalab/datalab.py Outdated Show resolved Hide resolved
cleanlab/datalab/datalab.py Outdated Show resolved Hide resolved
cleanlab/datalab/internal/data_issues.py Show resolved Hide resolved
cleanlab/datalab/internal/data_issues.py Show resolved Hide resolved
cleanlab/datalab/internal/data_issues.py Show resolved Hide resolved
cleanlab/datalab/internal/data_issues.py Show resolved Hide resolved
cleanlab/datalab/internal/data_issues.py Show resolved Hide resolved
mglowacki100 and others added 11 commits October 15, 2023 21:15
Doc-string added

Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
Co-authored-by: Elías Snorrason <eliassno@gmail.com>
comment explaining workaround
function renaming to avoid conflict
name collision fix
@jwmueller jwmueller requested a review from elisno October 18, 2023 02:19
add test class for the IssueFinder when the task is regression
The register method and issue manager factories will be more more maintainable if every issue manage is registered to a task.

For now, only label issue checks work for regression, as that is the only issue manger registered for regression.
Run through cases where it looks at the kwargs. Another case where it looks at the issue_types kwarg.
not within the scope of this PR
docstrings, whitespace,...
Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful work @mglowacki100!

I've made some adjustments to the registry and related functionality, as well as handled labels are numerical values (for regression).

@elisno elisno merged commit a6d1319 into cleanlab:master Nov 20, 2023
20 checks passed
@mglowacki100
Copy link
Contributor Author

Hi @elisno
Thank you for your kind words! I really appreciate your acknowledgment of the work.
I look forward to continuing to contribute to cleanlab.
Best regards,
Marek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants