Skip to content

[Feature Request] Metrics for GP regression/classification #1857

Closed
@patel-zeel

Description

🚀 Feature Request

It would be great to have a set of probabilistic/Bayesian metrics to evaluate GP regression/classification models.

Motivation

Scikit-learn has sklearn.metrics to facilitate standard metrics for regression and classification models. However, conventional metrics such as RMSE or R^2 are insufficient to evaluate GP regression models (in general, Bayesian regression models). The primary reason is the inability of including predictive variance or the entire posterior distribution in the evaluation. GP community has been using the metrics such as NLPD (Negative Log Predictive Density) and MSLL (Mean Standardized Log Loss) (Please see this issue for the references that support this argument). Thus, it might be a good idea to standardize and pack these metrics in a sophisticated GP library such as GPyTorch. Additionally, these will also be useful for new people in the community struggling to search/calculate/compare/benchmark widely-used metrics.

Pitch

Describe the solution you'd like

Something like the following may be very helpful to the community.

from gpytorch.metrics import neg_log_predictive_density, mean_standardized_log_loss
nlpd = neg_log_predictive_density(model, likelihood, test_x, test_y)
msll = mean_standardized_log_loss(model, likelihood, test_x, test_y)

A list of metrics currently I am aware of:

  • NLPD (Negative Log Predictive Density)
  • MSLL (Mean Standardized Log Loss)
  • CE (Coverage Error): Absolute difference between x% confidence interval and percentage of ground truth samples lying within that interval.

Describe alternatives you've considered

Manually calculating these metrics, which could be error-prone or time-consuming.

Are you willing to open a pull request? (We LOVE contributions!!!)
Definitely!

Additional context

Please see this issue

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions