-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] ENH: safe-level SMOTE #626
base: master
Are you sure you want to change the base?
Conversation
This pull request introduces 1 alert when merging bcc3069 into 321b751 - view on LGTM.com new alerts:
|
Codecov Report
@@ Coverage Diff @@
## master #626 +/- ##
==========================================
+ Coverage 97.93% 97.98% +0.05%
==========================================
Files 83 84 +1
Lines 4784 4911 +127
==========================================
+ Hits 4685 4812 +127
Misses 99 99
Continue to review full report at Codecov.
|
Thanks for the contribution. You will need to add tests to check that the new function is giving expected results. |
Oh I see that you mentioned it now :) |
I just added some tests. Any suggestions? |
This pull request introduces 1 alert when merging 394d686 into 321b751 - view on LGTM.com new alerts:
|
imblearn/over_sampling/_smote.py
Outdated
sampling_strategy=BaseOverSampler._sampling_strategy_docstring, | ||
random_state=_random_state_docstring, | ||
) | ||
class SLSMOTE(BaseSMOTE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre SafeLevelSMOTE
vs SLSMOTE
imblearn/over_sampling/_smote.py
Outdated
|
||
self.m_neighbors = m_neighbors | ||
|
||
def _assign_sl(self, nn_estimator, samples, target_class, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the name _assign_safe_levels
unless it hurts the readability in the calling code.
Thanks!
|
This pull request introduces 1 alert when merging fd11e32 into afbf781 - view on LGTM.com new alerts:
|
I've made changes accordingly. I think it's probably ready to go through a detailed review. |
I would suggest moving this implementation into smote_variants. The idea behind this move is to benchmark the smote variants on a common benchmark on a large number of datasets and include in imbalanced-learn only the versions that show an advantage. You can see the discussion and contribute to it: https://github.com/gykovacs/smote_variants/issues/14 @laurallu would this strategy would be fine with you? |
Since, Safe Level SMOTE exists there IMHO I believe that we should review @laurallu PR and merge it in |
OK, let's do that. Let's open an issue to discuss the inclusion criterion to explain what we are expecting in the future. I will review this PR in a near future. |
Thanks for pointing out the smote_variants to me. I will check it out. I would love to see the inclusion criterion too since I might code up more methods. |
This is an implementation of the safe-level SMOTE proposed in the following paper:
C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, "Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," In: Theeramunkong T.,
Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg, 475-482, 2009.
Todo list: