BinningProcess: error in binning_transform_params parameter with metric = bins #266
Description
Hello,
Thank you for your great work on this excellent module, really helpful.
I want to use BinningProcess to transform columns in a sklearn pipeline. However, I would like BinningProcess to return bins instead of the mean of the target variable to have meaningful bin names. To present my issue, I produce an example out of the sklearn pipeline.
My understanding of the BinningProcess documentation is that I can handle the binning output format:
- Either within the .transform method with the option
metric = "bins"
.
df = pd.DataFrame({'continuous_feature': choices(range(0,30), k=95) + [np.NaN]*5,
'cat_feature': choices(['A', 'B', 'C'], k = 100),
'target' : [uniform(15,16) for x in range(0,100)]})
all_features = ["continuous_feature", "cat_feature"]
X = df.loc[:, all_features]
y = df.loc[:, 'target']
BinningProcess(all_features).fit_transform(X,y, metric = "bins")
this works fine and I obtain the desired table:
However, since I eventually want to use BinningProcess in a pipeline, I cannot use this .transform method option.
- Or within the BinningProcess function with the option binning_transform_params
The equivalent code should be:
df = pd.DataFrame({'continuous_feature': choices(range(0,30), k=95) + [np.NaN]*5,
'cat_feature': choices(['A', 'B', 'C'], k = 100),
'target' : [uniform(15,16) for x in range(0,100)]})
all_features = ["continuous_feature", "cat_feature"]
X = df.loc[:, all_features]
y = df.loc[:, 'target']
BinningProcess(all_features,
binning_transform_params = {"continuous_feature": {"metric": "bins"},
"cat_feature": {"metric": "bins"}}).fit_transform(X,y)
Unfortunately, this yields an error
ValueError: could not convert string to float: '(-inf, 4.50)'
What could I to to prevent this error?
Also, binning_transform_params work well if I use another option than "bins", e.g. "indices" {"continuous_feature": {"metric": "indices"}, "cat_feature": {"metric": "indices"}}
: