Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Keep the index of the samples after undersampling #724

Closed
qiiiibeau opened this issue Jun 11, 2020 · 2 comments
Closed

[ENH] Keep the index of the samples after undersampling #724

qiiiibeau opened this issue Jun 11, 2020 · 2 comments

Comments

@qiiiibeau
Copy link

qiiiibeau commented Jun 11, 2020

Hello, I'm undersampling some imbalanced data with each sample a unique name as index. I don't want to lose the samples' index after undersampling because I'm doing a graph - based task where each sample represent a node, I need to know where it is located in the graph.

To be more illustrative, my data is a dataframe looks like:

feat_1 feat_2 feat_3 label
Thomas 0.5 2.2 3.0 1
Kelly 0.63 1.5 1.4 0
Peter 0.9 1.1 3.4 1
George 0.2 2.1 4 1
... ... ... ... ...

The current version of imblearn undersampling methods e.g. RandomUnderSampler().fit_resample() returns me a dataframe with index [0: length of selected samples] such as

feat_1 feat_2 feat_3 label
0 0.5 2.2 3.0 1
1 0.2 2.1 4 1

where all the original index are lost. I need it to be like:

feat_1 feat_2 feat_3 label
Thomas 0.5 2.2 3.0 1
George 0.2 2.1 4 1

This improvement would help a lot for graph-based imbalanced learning and maybe also in other cases.

Thank you.

@glemaitre
Copy link
Member

We might want to add support for this feature for samplers having a fitted attribute sample_indices_ after fit.
Otherwise, the index is meaningless. However, it makes the behaviour different from one sampler to another while a user can easily reassign an index which would be less surprising:

df_res, y_res = sampler.fit_resample(df, y)
df_res.index = df.index[sampler.sample_indices_]

@chkoar do you have any thought on this?

@glemaitre
Copy link
Member

This feature was added for the RandomUnderSampler and RandomOverSampler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants