Skip to content

HistGradientBoosting counts and sample weightsΒ #26128

Open
@lorentzenchr

Description

Related issues: #25210

Current State

HistGradientBootingClassifier and HistGradientBootingRegressor both:

  • Calculate the sample size count in histograms
  • Use count for splitting (mostly excluding split candidates)
  • Save the count in the final trees and use it in partial dependence computations.

Proposition

  1. Evaluate if removing count from the histograms (LightGBM only sums gradient and hessian in histograms, no count) gives a good speed-up.
    Edit: LightGBM uses an approximate count based on the hessian to check for min sample size. So this might not be what we want.
  2. Add an option to save counts and sample weights to final trees at the very end of fit (where the binned training X is still available).
  3. Use partial dependence method='recursion' if the above option was set, else use method='brute'.

Why?
#25431 concluded that adding weights to the trees is too expensive. The above proposition gives a user a clear choice: Faster training time or faster pdp afterwards.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions