Skip to content

BinningProcess: error in binning_transform_params parameter with metric = bins #266

Closed
@max-franceschi

Description

Hello,
Thank you for your great work on this excellent module, really helpful.

I want to use BinningProcess to transform columns in a sklearn pipeline. However, I would like BinningProcess to return bins instead of the mean of the target variable to have meaningful bin names. To present my issue, I produce an example out of the sklearn pipeline.

My understanding of the BinningProcess documentation is that I can handle the binning output format:

  1. Either within the .transform method with the option metric = "bins".
df = pd.DataFrame({'continuous_feature': choices(range(0,30), k=95) + [np.NaN]*5,
                        'cat_feature': choices(['A', 'B', 'C'], k = 100),
                         'target' : [uniform(15,16) for x in range(0,100)]})

all_features = ["continuous_feature", "cat_feature"]
X = df.loc[:, all_features]
y = df.loc[:, 'target']

BinningProcess(all_features).fit_transform(X,y, metric = "bins")

this works fine and I obtain the desired table:

image

However, since I eventually want to use BinningProcess in a pipeline, I cannot use this .transform method option.

  1. Or within the BinningProcess function with the option binning_transform_params

The equivalent code should be:

df = pd.DataFrame({'continuous_feature': choices(range(0,30), k=95) + [np.NaN]*5,
                    'cat_feature': choices(['A', 'B', 'C'], k = 100),
                     'target' : [uniform(15,16) for x in range(0,100)]})
all_features = ["continuous_feature", "cat_feature"]

X = df.loc[:, all_features]
y = df.loc[:, 'target']

BinningProcess(all_features,
               binning_transform_params = {"continuous_feature": {"metric": "bins"},
                                           "cat_feature": {"metric": "bins"}}).fit_transform(X,y)

Unfortunately, this yields an error

ValueError: could not convert string to float: '(-inf, 4.50)'

What could I to to prevent this error?

Also, binning_transform_params work well if I use another option than "bins", e.g. "indices" {"continuous_feature": {"metric": "indices"}, "cat_feature": {"metric": "indices"}}:
image

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions