Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sklearn RepeatedKFold cross validation routine. #115

Merged
merged 1 commit into from
Sep 10, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,3 @@ Here is the corresponding Bibtex entry
### Convergence test

* ["The graphical lasso: New Insights and alternatives"](https://web.stanford.edu/~hastie/Papers/glassoinsights.pdf) Mazumder and Hastie, 2012.

### Repeated KFold cross-validation

* ["Cross-validation pitfalls when selecting and assessing regression and classification models"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994246/) D. Krstajic, L. Buturovic, D. Leahy, and S. Thomas, 2014.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to leave this reference on why use repeatedKFold in. I find that several folks don't know that this is necessary, so it explains our preferred approach.

2 changes: 0 additions & 2 deletions inverse_covariance/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
from .rank_correlation import spearman_correlation, kendalltau_correlation
from .model_average import ModelAverage
from .adaptive_graph_lasso import AdaptiveGraphLasso, AdaptiveGraphicalLasso
from .cross_validation import RepeatedKFold

__all__ = [
"InverseCovarianceEstimator",
Expand All @@ -33,5 +32,4 @@
"ModelAverage",
"AdaptiveGraphLasso",
"AdaptiveGraphicalLasso",
"RepeatedKFold",
]
119 changes: 0 additions & 119 deletions inverse_covariance/cross_validation.py

This file was deleted.

11 changes: 4 additions & 7 deletions inverse_covariance/quic_graph_lasso.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@
from sklearn.utils import check_array, as_float_array, deprecated
from sklearn.utils.testing import assert_array_almost_equal
from sklearn.externals.joblib import Parallel, delayed
from sklearn.model_selection import cross_val_score # NOQA >= 0.18

# from sklearn.cross_validation import cross_val_score # NOQA < 0.18
from sklearn.model_selection import cross_val_score, RepeatedKFold

from . import pyquic
from .inverse_covariance import (
Expand All @@ -21,7 +19,6 @@
_compute_error,
_validate_path,
)
from .cross_validation import RepeatedKFold


def quic(
Expand Down Expand Up @@ -625,7 +622,7 @@ def fit(self, X, y=None):
elif isinstance(self.cv, tuple):
cv = self.cv

cv = RepeatedKFold(X.shape[0], n_folds=cv[0], n_trials=cv[1])
cv = RepeatedKFold(n_splits=cv[0], n_repeats=cv[1])

self.init_coefs(X)

Expand Down Expand Up @@ -662,11 +659,11 @@ def fit(self, X, y=None):
score_metric=self.score_metric,
init_method=self.init_method,
)
for train, test in cv
for train, test in cv.split(X)
)
else:
# parallel via spark
train_test_grid = [(train, test) for (train, test) in cv]
train_test_grid = [(train, test) for (train, test) in cv.split(X)]
indexed_param_grid = list(
zip(range(len(train_test_grid)), train_test_grid)
)
Expand Down
35 changes: 0 additions & 35 deletions inverse_covariance/tests/cross_validation_test.py

This file was deleted.