Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix categorical encoding for DLE #224

Merged
merged 1 commit into from
Feb 23, 2023
Merged

Fix categorical encoding for DLE #224

merged 1 commit into from
Feb 23, 2023

Conversation

nikml
Copy link
Contributor

@nikml nikml commented Feb 20, 2023

LightGBM treats negative values for categorical features as missing values.

We currently encode unseen values for categorical features to -1.

This PR adds 1 to the whole feature forcing the encoding of normal values to 1,N instead of 0,N-1 and therefore unseen values end up with 0 which is a valid value for lightgbm categorical features.

Screenshot 2023-02-20 at 15-44-53 Advanced Topics — LightGBM 3 3 5 99 documentation1

@nikml nikml requested a review from nnansters as a code owner February 20, 2023 17:17
@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Base: 79.26% // Head: 79.27% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (0fa7e05) compared to base (06c5642).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #224   +/-   ##
=======================================
  Coverage   79.26%   79.27%           
=======================================
  Files          73       73           
  Lines        5792     5794    +2     
  Branches      904      904           
=======================================
+ Hits         4591     4593    +2     
  Misses        964      964           
  Partials      237      237           
Impacted Files Coverage Δ
nannyml/plots/util.py 74.62% <ø> (ø)
...rformance_estimation/direct_loss_estimation/dle.py 93.06% <100.00%> (+0.14%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nnansters nnansters merged commit 6330170 into main Feb 23, 2023
@nnansters nnansters deleted the fix-dle-cat-encoding branch February 23, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants