Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set non iid num issues with an extra rule #707

Merged
merged 11 commits into from
May 10, 2023

Conversation

elisno
Copy link
Member

@elisno elisno commented May 9, 2023

With this PR, we perform an additional check for the number of issues when we have a low p-value (under 0.05), vs a high p-value.

We have to change the number of issues in the is_non_iid_issue column for this to take effect in the issue summary.

elisno added 2 commits May 9, 2023 18:10
A new heuristic is used, where a low p_value indicates that at least one datapoint in the dataset should have an issue. Otherwise, we disregard the number of issues overall.
…dataset

Previously, no example was flagged. With the new heuristic, the "worst" scoring example is flagged.
@elisno elisno requested a review from jwmueller May 9, 2023 19:01
@@ -89,7 +89,7 @@ def test_find_issues(self, issue_manager, embeddings):
issue_manager.summary,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add more end-to-end tests of the noniid check at the datalab level?

@codecov
Copy link

codecov bot commented May 9, 2023

Codecov Report

Merging #707 (86704c0) into master (49559ef) will decrease coverage by 0.28%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master     #707      +/-   ##
==========================================
- Coverage   95.74%   95.46%   -0.28%     
==========================================
  Files          46       46              
  Lines        3642     3639       -3     
  Branches      650      645       -5     
==========================================
- Hits         3487     3474      -13     
- Misses         80       85       +5     
- Partials       75       80       +5     
Impacted Files Coverage Δ
cleanlab/__init__.py 100.00% <ø> (ø)
cleanlab/datalab/data.py 96.55% <ø> (ø)
cleanlab/datalab/issue_manager/noniid.py 90.96% <88.88%> (+1.15%) ⬆️

... and 7 files with indirect coverage changes

@elisno elisno merged commit c41c837 into cleanlab:master May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants