-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a knn module #1117
Add a knn module #1117
Conversation
…earch object from an array of numerical features
…g the NearestNeighbors object
…ructing the NearestNeighbors object
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1117 +/- ##
==========================================
- Coverage 96.13% 96.05% -0.09%
==========================================
Files 76 80 +4
Lines 6088 6102 +14
Branches 1081 1075 -6
==========================================
+ Hits 5853 5861 +8
- Misses 140 143 +3
- Partials 95 98 +3 ☔ View full report in Codecov by Sentry. |
) | ||
knn = NearestNeighbors(n_neighbors=self.k, metric=self.metric).fit(features) | ||
|
||
if self.metric and self.metric != knn.metric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition won't evaluate to True
, because even if the metric changed, then the knn
object must have been given the same metric in this part of the code.
Removing it is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall, except the remaining comments I've left.
Leaving final merge to @huiwengoh (I don't want to review this again)
@elisno Before you ping Hui Wen for the final review, please address all of my remaining comments, comment on them how they are addressed, and only then mark them as Resolved.
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
… knn object This solves two issues: - The default metric choice can be extracted from the knn object, so the calling function is no longer responsible for figuring out what metric to pick. Instead, it should assume that the knn object has set a valid metric attribute (knn.metric). - Removing the dependency on decide_default_metric means that we no longer need to cast features to a numpy array (statically, it was a Union[None, np.ndarray], which is incompatible with that function.
I've addressed the comments made by @jwmueller. @huiwen could you review + merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! nice work
Summary
This PR prioritizes a refactoring of the two modules
that rely heavily on a knn object (
NearestNeighbors
) for scoring and flagging different kinds of issues.MERGE THIS AFTER #1115 !
Included docs
More information
The logic for constructing a
NearestNeighbor
object has been duplicated in several places like so:This PR reduces this headache to a single function call, as this is usually some boilerplate code for creating a knn-object from scratch.
Additionally, there are some dev docs for module added in the new directory
cleanlab/internal/neighbor
. Those are not a priority and can be omitted in this PR.