Skip to content

Sometimes Counterfactuals generated with random method have wrong class #417

Open
@benediktsatalia

Description

I encountered situations where the returned counterfactuals have not the desired class. It happens only sometimes so I needed to play with seeds to get a reproducible example. I boiled it down to a simple example based on the getting started notebook.

This is the output the code produces:

Query instance (original outcome : 0)
   age workclass education marital_status    occupation   race gender  hours_per_week  income
0   32   Private   HS-grad        Married  White-Collar  White   Male              60       0

Diverse Counterfactual set (new outcome: 1)
   age workclass  education marital_status    occupation   race gender  hours_per_week  income
0   61   Private    HS-grad        Married  Professional  White   Male              60       0
1   32   Private  Bachelors        Married  White-Collar  White   Male              60       1

The code to reprdocue:

# Sklearn imports
from sklearn.compose import ColumnTransformer
from sklearn.discriminant_analysis import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier

# DiCE imports
import dice_ml
from dice_ml.utils import helpers  # helper functions

dataset = helpers.load_adult_income_dataset()
dataset = dataset.sample(1000, random_state=1)

y_train = dataset["income"]
x_train = dataset.drop('income', axis=1)

# Step 1: dice_ml.Data
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')


numerical = ["age", "hours_per_week"]
categorical = x_train.columns.difference(numerical)

# We create the preprocessing pipelines for both numeric and categorical data.
numeric_transformer = Pipeline(steps=[("scaler", StandardScaler())])

categorical_transformer = Pipeline(steps=[("onehot", OneHotEncoder(handle_unknown="ignore"))])

transformations = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numerical),
        ("cat", categorical_transformer, categorical),
    ]
)

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(
    steps=[("preprocessor", transformations), ("classifier", RandomForestClassifier(random_state=1))]
)
model = clf.fit(x_train, y_train)

# Using sklearn backend
m = dice_ml.Model(model=model, backend="sklearn")
# Using method=random for generating CFs
exp = dice_ml.Dice(d, m, method="random")

e1 = exp.generate_counterfactuals(x_train[4:5], total_CFs=2, desired_class="opposite", random_seed = 6)
e1.visualize_as_dataframe()

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions