[Feature Request] Metrics for GP regression/classification

# 🚀 Feature Request



 It would be great to have a set of probabilistic/Bayesian metrics to evaluate GP regression/classification models. 

## Motivation




Scikit-learn has `sklearn.metrics` to facilitate standard metrics for regression and classification models. However, conventional metrics such as RMSE or R^2 are insufficient to evaluate GP regression models (in general, Bayesian regression models). The primary reason is the inability of including predictive variance or the entire posterior distribution in the evaluation. GP community has been using the metrics such as NLPD (Negative Log Predictive Density) and MSLL (Mean Standardized Log Loss) (Please see [this issue](https://github.com/scikit-learn/scikit-learn/issues/21665) for the references that support this argument). Thus, it might be a good idea to standardize and pack these metrics in a sophisticated GP library such as GPyTorch. Additionally, these will also be useful for new people in the community struggling to search/calculate/compare/benchmark widely-used metrics.

## Pitch

**Describe the solution you'd like**

Something like the following may be very helpful to the community.

```py
from gpytorch.metrics import neg_log_predictive_density, mean_standardized_log_loss
nlpd = neg_log_predictive_density(model, likelihood, test_x, test_y)
msll = mean_standardized_log_loss(model, likelihood, test_x, test_y)
```

A list of metrics currently I am aware of:
* NLPD (Negative Log Predictive Density)
* MSLL (Mean Standardized Log Loss)
* CE (Coverage Error): Absolute difference between x% confidence interval and percentage of ground truth samples lying within that interval.

**Describe alternatives you've considered**


Manually calculating these metrics, which could be error-prone or time-consuming. 

**Are you willing to open a pull request?** (We LOVE contributions!!!)
Definitely!

## Additional context
Please see [this issue](https://github.com/scikit-learn/scikit-learn/issues/21665)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Metrics for GP regression/classification #1857

🚀 Feature Request

Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development