[BUG?] Very large SHAP values for multi class model #6726
Description
Description
I have recently made a multi-class model and have been examining the SHAP.
Typically SHAP values are in the range of around [-10,10], but my model is outputting SHAP values in the range of +/- thousands...
This is replicated using the lgbm_learner.predict(X_test, pred_contrib=True)
and using the SHAP packages TreeExplainer
.
I have narrowed down a way to mitigate this issues, by adjusting the LightGBM parameters.
Specifically, either:
- increasing
min_data_in_leaf
(to above 58) - increasing
min_sum_hessian_in_leaf
(to above1e-2
)
It is worth noting, I haven't found any issues with binary prediction, just multi-class.
Reproducible example
I have tried very hard to produce an example of this with synthetic data, but I have not been able to recreate the problem (and unfortunately, I cannot share the data used).
This makes me this there is something quirky going on with my data and LightGBM.
My features are either float64
or int64
, and I have tried both an int
and catagorical
type target.
Environment info
Python Version: python 3.11.10
LightGBM version: lightgbm 4.5.0
Command(s) you used to install LightGBM
conda install lightgbm
Additional Comments
Even though I cannot reproduce this problem with synthetic data, I thought it would be good to report it (I hope this is ok).
Activity