You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm undersampling some imbalanced data with each sample a unique name as index. I don't want to lose the samples' index after undersampling because I'm doing a graph - based task where each sample represent a node, I need to know where it is located in the graph.
To be more illustrative, my data is a dataframe looks like:
feat_1
feat_2
feat_3
label
Thomas
0.5
2.2
3.0
1
Kelly
0.63
1.5
1.4
0
Peter
0.9
1.1
3.4
1
George
0.2
2.1
4
1
...
...
...
...
...
The current version of imblearn undersampling methods e.g. RandomUnderSampler().fit_resample() returns me a dataframe with index [0: length of selected samples] such as
feat_1
feat_2
feat_3
label
0
0.5
2.2
3.0
1
1
0.2
2.1
4
1
where all the original index are lost. I need it to be like:
feat_1
feat_2
feat_3
label
Thomas
0.5
2.2
3.0
1
George
0.2
2.1
4
1
This improvement would help a lot for graph-based imbalanced learning and maybe also in other cases.
Thank you.
The text was updated successfully, but these errors were encountered:
We might want to add support for this feature for samplers having a fitted attribute sample_indices_ after fit.
Otherwise, the index is meaningless. However, it makes the behaviour different from one sampler to another while a user can easily reassign an index which would be less surprising:
Hello, I'm undersampling some imbalanced data with each sample a unique name as index. I don't want to lose the samples' index after undersampling because I'm doing a graph - based task where each sample represent a node, I need to know where it is located in the graph.
To be more illustrative, my data is a dataframe looks like:
The current version of imblearn undersampling methods e.g.
RandomUnderSampler().fit_resample()
returns me a dataframe with index [0: length of selected samples] such aswhere all the original index are lost. I need it to be like:
This improvement would help a lot for graph-based imbalanced learning and maybe also in other cases.
Thank you.
The text was updated successfully, but these errors were encountered: