Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] ENH: safe-level SMOTE #626

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

laurallu
Copy link

@laurallu laurallu commented Nov 4, 2019

This is an implementation of the safe-level SMOTE proposed in the following paper:

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, "Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," In: Theeramunkong T.,
Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg, 475-482, 2009.

Todo list:

  • add unit tests

@lgtm-com
Copy link

lgtm-com bot commented Nov 4, 2019

This pull request introduces 1 alert when merging bcc3069 into 321b751 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison

@codecov
Copy link

codecov bot commented Nov 4, 2019

Codecov Report

Merging #626 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
+ Coverage   97.93%   97.98%   +0.05%     
==========================================
  Files          83       84       +1     
  Lines        4784     4911     +127     
==========================================
+ Hits         4685     4812     +127     
  Misses         99       99
Impacted Files Coverage Δ
imblearn/over_sampling/_smote.py 97.73% <100%> (+0.52%) ⬆️
imblearn/over_sampling/__init__.py 100% <100%> (ø) ⬆️
...blearn/over_sampling/tests/test_safelevel_smote.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afbf781...866a04f. Read the comment docs.

@glemaitre
Copy link
Member

Thanks for the contribution.

You will need to add tests to check that the new function is giving expected results.

@glemaitre
Copy link
Member

Oh I see that you mentioned it now :)

@laurallu
Copy link
Author

laurallu commented Nov 6, 2019

I just added some tests. Any suggestions?

@lgtm-com
Copy link

lgtm-com bot commented Nov 6, 2019

This pull request introduces 1 alert when merging 394d686 into 321b751 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison

sampling_strategy=BaseOverSampler._sampling_strategy_docstring,
random_state=_random_state_docstring,
)
class SLSMOTE(BaseSMOTE):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre SafeLevelSMOTE vs SLSMOTE


self.m_neighbors = m_neighbors

def _assign_sl(self, nn_estimator, samples, target_class, y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the name _assign_safe_levels unless it hurts the readability in the calling code.

@chkoar
Copy link
Member

chkoar commented Nov 6, 2019

Thanks!

I just added some tests. Any suggestions?

  1. Please use full names in variables, if you can. E.g. sl should be safe_lavels. Unless it hurts the readability.
  2. Could you add a section in the documentation?

@lgtm-com
Copy link

lgtm-com bot commented Nov 11, 2019

This pull request introduces 1 alert when merging fd11e32 into afbf781 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison

@laurallu laurallu changed the title [WIP] ENH: safe-level SMOTE [MRG] ENH: safe-level SMOTE Nov 12, 2019
@laurallu
Copy link
Author

1. Please use full names in variables, if you can. E.g. [`sl`](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/394d686364725763de8ea2cc3f504d8c08fe111a/imblearn/over_sampling/_smote.py#L1469) should be `safe_lavels`. Unless it hurts the readability.

2. Could you add a section in the [documentation](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/doc/over_sampling.rst)?

I've made changes accordingly. I think it's probably ready to go through a detailed review.

@glemaitre
Copy link
Member

I would suggest moving this implementation into smote_variants. The idea behind this move is to benchmark the smote variants on a common benchmark on a large number of datasets and include in imbalanced-learn only the versions that show an advantage. You can see the discussion and contribute to it: https://github.com/gykovacs/smote_variants/issues/14

@laurallu would this strategy would be fine with you?

@chkoar
Copy link
Member

chkoar commented Nov 17, 2019

Since, Safe Level SMOTE exists there IMHO I believe that we should review @laurallu PR and merge it in imblearn.

@glemaitre
Copy link
Member

OK, let's do that. Let's open an issue to discuss the inclusion criterion to explain what we are expecting in the future. I will review this PR in a near future.

@laurallu
Copy link
Author

Thanks for pointing out the smote_variants to me. I will check it out. I would love to see the inclusion criterion too since I might code up more methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants