-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extends non-iid issue check to run if only pred_probs are provided #857
Extends non-iid issue check to run if only pred_probs are provided #857
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #857 +/- ##
==========================================
- Coverage 96.70% 96.49% -0.21%
==========================================
Files 65 65
Lines 5091 5188 +97
Branches 875 901 +26
==========================================
+ Hits 4923 5006 +83
- Misses 86 94 +8
- Partials 82 88 +6
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @abhijitpal1247!
To make this easier to review, can you:
Add a basic unit test that evaluates the end-to-end usage of your new issue manager configuration from the Datalab level. Please comment a link to this unit test, so I can also run it myself.
Add a comment to your PR description showing example code to test this new functionality out, and what the output looks like exactly (copy/paste the resulting Datalab report()
and its issues
, issue_summary
, and info
attributes.
Thanks, we can get somebody to review this PR in more detail once those basics are provided. Then after your code structure has been verified to be good, we'll ask you to add a few more unit tests in order to get the codecov of this new code up to 100%.
input:
issues:
PS: Converted df to dictionary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @abhijitpal1247!
The current tests appear solid. However, it would be beneficial to incorporate an end-to-end test case that validates the functionality with either features
or pred_probs
. Ensure both pathways operate as expected, particularly from the Datalab
level rather than the IssueManager
level.
I've also provided a comment regarding a potential enhancement in the NonIIDIssueManager.find_issues
method: consider breaking down the code into separate helper functions for improved readability and maintainability.
Hi @abhijitpal1247! Thank you for swiftly putting up new tests! 🚀
With these slight modifications, your contribution will perfectly align with our testing approach! Please feel free to reach out if you have any questions. |
@elisno , I have added the changes. Please review it. As we are nearing to the end of hacktoberfest, I would love to get this merge. |
Hi @abhijitpal1247, this PR is looking solid! I need to do a bit of cleanup, but then it can be merged today. |
This commit introduces a temporary flag `_skip_storing_knn_graph_for_pred_probs` to the `NonIIDIssueManager` class to address some breaking tests. This commit introduces a temporary flag `_skip_storing_knn_graph_for_pred_probs` to the `NonIIDIssueManager` class to address some breaking tests. This change allows us to bypass the storage of knn graphs based on `pred_probs` and is intended to be a temporary solution.
The NonIIDIssueManager accepts pred_probs but doesn't rely on the labels, which is new. An existing end-to-end test is updated here to reflect this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff @abhijitpal1247!
The argument handling from the Datalab side had to be configured and some of the end-to-end tests needed to be updated to reflect that.
Some existing tests broke as a result of this:
- Some due to knn_graphs (implicitly based on some feature arrays) have always been prioritized.
- This broke a test where incremental search would reuse the knn-graph computed from the pred_probs in the NonIIDIssueManager.
- A short-term solution is just to add a flag to tell the issue manager to avoid storing the knn_graph when necessary. Sure, we use the knn_graph logic, but We still want to assume that most
knn_graph
come from a more generalfeatures
array.
- Previously, we assumed that a Datlab without any labels wouldn't find any issues when passed
pred_probs
, with this PR, thesepred_probs
can be used by the NonIIDIssueManager.
This PR is in a much better state now to merge! Thank you!
I'll review the mypy-stuff flagged in the CI in a separate PR. The error:
|
Closes #808
Added an additional param for pred_probs in find_issues function and an additional check in non_iid.py file.