Skip to content

[BUG?] Very large SHAP values for multi class model #6726

Open
@benedictjones

Description

Description

I have recently made a multi-class model and have been examining the SHAP.
Typically SHAP values are in the range of around [-10,10], but my model is outputting SHAP values in the range of +/- thousands...
This is replicated using the lgbm_learner.predict(X_test, pred_contrib=True) and using the SHAP packages TreeExplainer.

I have narrowed down a way to mitigate this issues, by adjusting the LightGBM parameters.
Specifically, either:

  • increasing min_data_in_leaf (to above 58)
  • increasing min_sum_hessian_in_leaf (to above 1e-2)

It is worth noting, I haven't found any issues with binary prediction, just multi-class.

Reproducible example

I have tried very hard to produce an example of this with synthetic data, but I have not been able to recreate the problem (and unfortunately, I cannot share the data used).
This makes me this there is something quirky going on with my data and LightGBM.
My features are either float64 or int64, and I have tried both an int and catagorical type target.

Environment info

Python Version: python 3.11.10
LightGBM version: lightgbm 4.5.0

Command(s) you used to install LightGBM

conda install lightgbm

Additional Comments

Even though I cannot reproduce this problem with synthetic data, I thought it would be good to report it (I hope this is ok).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions