Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extends non-iid issue check to run if only pred_probs are provided #857

Merged
merged 27 commits into from
Oct 30, 2023

Conversation

abhijitpal1247
Copy link
Contributor

@abhijitpal1247 abhijitpal1247 commented Oct 3, 2023

Closes #808
Added an additional param for pred_probs in find_issues function and an additional check in non_iid.py file.

@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8b8bf78) 96.70% compared to head (bbd4fa0) 96.49%.
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #857      +/-   ##
==========================================
- Coverage   96.70%   96.49%   -0.21%     
==========================================
  Files          65       65              
  Lines        5091     5188      +97     
  Branches      875      901      +26     
==========================================
+ Hits         4923     5006      +83     
- Misses         86       94       +8     
- Partials       82       88       +6     
Files Coverage Δ
cleanlab/datalab/internal/issue_finder.py 98.86% <ø> (ø)
cleanlab/datalab/internal/issue_manager/noniid.py 99.44% <100.00%> (-0.56%) ⬇️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jwmueller jwmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @abhijitpal1247!

To make this easier to review, can you:

Add a basic unit test that evaluates the end-to-end usage of your new issue manager configuration from the Datalab level. Please comment a link to this unit test, so I can also run it myself.

Add a comment to your PR description showing example code to test this new functionality out, and what the output looks like exactly (copy/paste the resulting Datalab report() and its issues, issue_summary, and info attributes.

Thanks, we can get somebody to review this PR in more detail once those basics are provided. Then after your code structure has been verified to be good, we'll ask you to add a few more unit tests in order to get the codecov of this new code up to 100%.

@jwmueller jwmueller changed the title Extends non-iid issue check Extends non-iid issue check to run if only pred_probs are provided Oct 5, 2023
@abhijitpal1247
Copy link
Contributor Author

abhijitpal1247 commented Oct 6, 2023

@jwmueller

 issue_manager = NonIIDIssueManager(
            datalab=lab,
            num_permutations=15,
        )
issue_manager.find_issues(pred_probs=pred_probs)
        issues_sort, summary_sort, info_sort = (
            issue_manager.issues,
            issue_manager.summary,
            issue_manager.info,
        )

input:
preds_prob=[[0. ], [0.02], [0.04], [0.06], [0.08], [0.1 ], [0.12], [0.14], [0.16], [0.18], [0.2 ], [0.22], [0.24], [0.26], [0.28], [0.3 ], [0.32], [0.34], [0.36], [0.38], [0.4 ], [0.42], [0.44], [0.46], [0.48], [0.5 ], [0.52], [0.54], [0.56], [0.58], [0.6 ], [0.62], [0.64], [0.66], [0.68], [0.7 ], [0.72], [0.74], [0.76], [0.78], [0.8 ], [0.82], [0.84], [0.86], [0.88], [0.9 ], [0.92], [0.94], [0.96], [0.98]]
report:

---------------------- non_iid issues ----------------------

Number of examples with this issue: 1
Overall dataset quality in terms of this issue: 0.0000

Examples representing most severe instances of this issue:
    is_non_iid_issue  non_iid_score
46              True       0.621456
47             False       0.621464
48             False       0.624315
45             False       0.624483
2              False       0.628596

Additional Information: 
p-value: 0.0

issues:
[{'is_non_iid_issue': False, 'non_iid_score': 0.6369381182602835}, {'is_non_iid_issue': False, 'non_iid_score': 0.6312602312829466}, {'is_non_iid_issue': False, 'non_iid_score': 0.628595788692538}, {'is_non_iid_issue': False, 'non_iid_score': 0.6287408467480342}, {'is_non_iid_issue': False, 'non_iid_score': 0.6318799852036798}, {'is_non_iid_issue': False, 'non_iid_score': 0.6382381497632857}, {'is_non_iid_issue': False, 'non_iid_score': 0.6460975448148727}, {'is_non_iid_issue': False, 'non_iid_score': 0.6535407914178268}, {'is_non_iid_issue': False, 'non_iid_score': 0.660525427878278}, {'is_non_iid_issue': False, 'non_iid_score': 0.6670057807817809}, {'is_non_iid_issue': False, 'non_iid_score': 0.6729326069685921}, {'is_non_iid_issue': False, 'non_iid_score': 0.6782526852523574}, {'is_non_iid_issue': False, 'non_iid_score': 0.6829083514188119}, {'is_non_iid_issue': False, 'non_iid_score': 0.6868369695626632}, {'is_non_iid_issue': False, 'non_iid_score': 0.6899703325045949}, {'is_non_iid_issue': False, 'non_iid_score': 0.6922339840149807}, {'is_non_iid_issue': False, 'non_iid_score': 0.6935464560709836}, {'is_non_iid_issue': False, 'non_iid_score': 0.6938184157172033}, {'is_non_iid_issue': False, 'non_iid_score': 0.6929517187837507}, {'is_non_iid_issue': False, 'non_iid_score': 0.6908383724869578}, {'is_non_iid_issue': False, 'non_iid_score': 0.6873594169140365}, {'is_non_iid_issue': False, 'non_iid_score': 0.6823837482340884}, {'is_non_iid_issue': False, 'non_iid_score': 0.6757669266351432}, {'is_non_iid_issue': False, 'non_iid_score': 0.6673500430515782}, {'is_non_iid_issue': False, 'non_iid_score': 0.6569587659266529}, {'is_non_iid_issue': False, 'non_iid_score': 0.6444027599985143}, {'is_non_iid_issue': False, 'non_iid_score': 0.655574390589954}, {'is_non_iid_issue': False, 'non_iid_score': 0.6646638820609068}, {'is_non_iid_issue': False, 'non_iid_score': 0.6718590542325604}, {'is_non_iid_issue': False, 'non_iid_score': 0.6773309370896663}, {'is_non_iid_issue': False, 'non_iid_score': 0.6812347423842171}, {'is_non_iid_issue': False, 'non_iid_score': 0.6837110768967657}, {'is_non_iid_issue': False, 'non_iid_score': 0.684887244063972}, {'is_non_iid_issue': False, 'non_iid_score': 0.6848785406277429}, {'is_non_iid_issue': False, 'non_iid_score': 0.6837894941513081}, {'is_non_iid_issue': False, 'non_iid_score': 0.68171501249609}, {'is_non_iid_issue': False, 'non_iid_score': 0.6787414323250656}, {'is_non_iid_issue': False, 'non_iid_score': 0.6749474635518462}, {'is_non_iid_issue': False, 'non_iid_score': 0.6704050325112153}, {'is_non_iid_issue': False, 'non_iid_score': 0.6651800299014696}, {'is_non_iid_issue': False, 'non_iid_score': 0.659332971180139}, {'is_non_iid_issue': False, 'non_iid_score': 0.6529195777008692}, {'is_non_iid_issue': False, 'non_iid_score': 0.6459912868650473}, {'is_non_iid_issue': False, 'non_iid_score': 0.638595699189377}, {'is_non_iid_issue': False, 'non_iid_score': 0.6307769696279734}, {'is_non_iid_issue': False, 'non_iid_score': 0.6244834157786012}, {'is_non_iid_issue': True, 'non_iid_score': 0.6214564665072333}, {'is_non_iid_issue': False, 'non_iid_score': 0.6214637446963016}, {'is_non_iid_issue': False, 'non_iid_score': 0.6243154289023275}, {'is_non_iid_issue': False, 'non_iid_score': 0.6302170937742231}]
summary:
[{'issue_type': 'non_iid', 'score': 0.0}]
info:

{'p-value': 0.0, 'metric': 'euclidean', 'k': 10, 'statistics': {'weighted_knn_graph': <50x50 sparse matrix of type '<class 'numpy.float64'>'
	with 500 stored elements in Compressed Sparse Row format>, 'knn_metric': 'euclidean'}}

PS: Converted df to dictionary

@jwmueller jwmueller requested a review from elisno October 7, 2023 13:10
Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @abhijitpal1247!

The current tests appear solid. However, it would be beneficial to incorporate an end-to-end test case that validates the functionality with either features or pred_probs. Ensure both pathways operate as expected, particularly from the Datalab level rather than the IssueManagerlevel.

I've also provided a comment regarding a potential enhancement in the NonIIDIssueManager.find_issues method: consider breaking down the code into separate helper functions for improved readability and maintainability.

cleanlab/datalab/internal/issue_manager/noniid.py Outdated Show resolved Hide resolved
@elisno
Copy link
Member

elisno commented Oct 10, 2023

Hi @abhijitpal1247!

Thank you for swiftly putting up new tests! 🚀
I’d like to discuss a couple of adjustments to align the tests with our conventions for this package.

  1. We try out best to keep our tests modular and focused. Let’s move the end-to-end test involving Datalab to tests/datalab/test_datalab.py to maintain a clean separation and easy navigation through our test suites. For this PR, just make a brand new test class in that file.
  2. While the flags introduce flexibility in the tests, they can also add complexity. We prioritize keeping our tests straightforward and deterministic. Let’s refactor to have distinct test functions for different scenarios (classes) without employing flags to shift test behavior.

With these slight modifications, your contribution will perfectly align with our testing approach!

Please feel free to reach out if you have any questions.

@abhijitpal1247
Copy link
Contributor Author

abhijitpal1247 commented Oct 25, 2023

@elisno , I have added the changes. Please review it. As we are nearing to the end of hacktoberfest, I would love to get this merge.

@elisno
Copy link
Member

elisno commented Oct 27, 2023

Hi @abhijitpal1247, this PR is looking solid! I need to do a bit of cleanup, but then it can be merged today.

This commit introduces a temporary flag `_skip_storing_knn_graph_for_pred_probs` to
the `NonIIDIssueManager` class to address some breaking tests. This commit introduces a temporary flag `_skip_storing_knn_graph_for_pred_probs` to
the `NonIIDIssueManager` class to address some breaking tests. This change allows us
to bypass the storage of knn graphs based on `pred_probs` and is intended to be a
temporary solution.
The NonIIDIssueManager accepts pred_probs but doesn't rely on the labels, which is new.
An existing end-to-end test is updated here to reflect this change.
Copy link
Member

@elisno elisno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @abhijitpal1247!

The argument handling from the Datalab side had to be configured and some of the end-to-end tests needed to be updated to reflect that.

Some existing tests broke as a result of this:

  • Some due to knn_graphs (implicitly based on some feature arrays) have always been prioritized.
    • This broke a test where incremental search would reuse the knn-graph computed from the pred_probs in the NonIIDIssueManager.
    • A short-term solution is just to add a flag to tell the issue manager to avoid storing the knn_graph when necessary. Sure, we use the knn_graph logic, but We still want to assume that most knn_graph come from a more general features array.
  • Previously, we assumed that a Datlab without any labels wouldn't find any issues when passed pred_probs, with this PR, these pred_probs can be used by the NonIIDIssueManager.

This PR is in a much better state now to merge! Thank you!

@elisno
Copy link
Member

elisno commented Oct 30, 2023

I'll review the mypy-stuff flagged in the CI in a separate PR.

The error:

cleanlab/datalab/internal/issue_manager/noniid.py:217: error: Argument 4 to "_select_features_and_setup_knn" of "NonIIDIssueManager" has incompatible type "str | bool | None"; expected "bool"  [arg-type]

@elisno elisno merged commit ccb4a90 into cleanlab:master Oct 30, 2023
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extend non-iid issue check in Datalab
3 participants