-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical (binary) column incorrectly treated as continuous for Univariate Drift Detection #171
Comments
Hey Nikos, this behavior is correct. The columns are designated by NannyML as continuous or categorical in the You are right however that this is not the expected behavior given the example in the docs. This can be fixed by explicitly setting the reference_df['y_pred'] = reference_df['y_pred'].astype("category")
analysis_df['y_pred'] = analysis_df['y_pred'].astype("category")
column_names = ['distance_from_office', 'salary_range', 'gas_price_per_litre', 'public_transportation_cost', 'wfh_prev_workday', 'workday', 'tenure', 'y_pred_proba', 'y_pred']
calc = nml.UnivariateDriftCalculator(
column_names=column_names,
timestamp_column_name='timestamp',
continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
categorical_methods=['chi2', 'jensen_shannon'],
) |
Signed-off-by: niels <niels@nannyml.com>
I looked a bit further into this. Quickstart is also affected. And actually the issue was introduced in version So we should also fix that and see if documentation needs a little more polishing. |
* Many Updates to Univariate Drift Comparison * Update Univariate Drift Tutorial * Update Readme, fixing incorrect images for drift * Remove unneeded drift images * Fix PCA How it works page showing outdated code. * Fix realized regression performance docs and relevant readme plot * Remove unneeded realized performance images * Fix quickstart re #171 Co-authored-by: cartgr <carterblair@uvic.ca> Co-authored-by: Jakub Bialek <jakub@nannyml.com>
Closing as quickstart also received a hot fix - we can polish the docs later. |
Describe the bug
The binary predictions from the synthetic binary classification are treated as continuous rather than categorical.
To Reproduce
Steps to reproduce the behavior:
Run the Univariate Drift Example Notebook from where documentation is created.
y_pred
is treated as continuous instead of categorical.Expected behavior
Column would be treated as continuous.
Screenshots & scripts
The variable is present in the continuous drift results for v0.8.1:
https://nannyml.readthedocs.io/en/v0.8.1/_images/drift-guide-continuous.svg
The text was updated successfully, but these errors were encountered: