-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC improve documentation of NCR #1017
base: master
Are you sure you want to change the base?
Conversation
@glemaitre ready for review |
doc/under_sampling.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The :class:`NeighbourhoodCleaningRule` is another "cleaning" algorithm. It removes | ||
samples from the majority class that are closest to the boundary with the minority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samples from the majority class that are closest to the boundary with the minority | |
samples from the majority class that are the closest to the boundary formed by the samples of the minority class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally understand this sentence. Let me try a modification in a new commit.
doc/under_sampling.rst
Outdated
|
||
The :class:`NeighbourhoodCleaningRule` expands on the cleaning performed by | ||
:class:`EditedNearestNeighbours` by eliminating additional majority class samples if | ||
they are among the 3 closest neighbours of a sample from the minority class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a parameter controlling the 3-NN
.
they are among the 3 closest neighbours of a sample from the minority class. | |
they are among the :math:`N` closest neighbours (i.e. using the parameter `n_neighbours`) of a sample from the minority class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout the docs we are using K as the number of neighbours, not N. I guess the n in n_neighbours comes from n=number. I'd rather stick to K if that's alright with you, for consitency. I'll fix this in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I removed this sentence altogether as per below suggestion.
The procedure for the :class:`NeighbourhoodCleaningRule` is as follows: | ||
|
||
1. Remove observations from the majority class with edited nearest neighbors (ENN). | ||
2. Remove additional samples from the majority class if they are one of the k closest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we repeating the same sentence as above, I would remove the paragraph above and only go with the bullet point sequence.
doc/under_sampling.rst
Outdated
To carry out step 2 there is one condition: a sample will only be removed if its class | ||
has a minimum number of observations. The minimum number of observations is regulated | ||
by the `threshold_cleaning` parameter. In the original article | ||
:cite:`laurikkala2001improving`, samples would be removed if the class had at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not go in details regarding the original paper but instead just phrase that we check that the number of samples in the class to under-sample is above the threshold times the number of samples in the minority class.
imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py
Outdated
Show resolved
Hide resolved
imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py
Outdated
Show resolved
Hide resolved
How can I check the linting error message? |
imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Thank you! |
Reword documentation and docstrings for the NCR.
Related to #854