Skip to content

skfolio/skfolio

Licence Codecov Black PythonVersion PyPi CI/CD Downloads Ruff Contribution Website

skfolio documentation skfolio

skfolio is a Python library for portfolio optimization built on top of scikit-learn. It offers a unified interface and tools compatible with scikit-learn to build, fine-tune, and cross-validate portfolio models.

It is distributed under the open source 3-Clause BSD license.

examples

Important links

Installation

skfolio is available on PyPI and can be installed with:

pip install -U skfolio

Dependencies

skfolio requires:

  • python (>= 3.10)
  • numpy (>= 1.23.4)
  • scipy (>= 1.8.0)
  • pandas (>= 1.4.1)
  • cvxpy (>= 1.4.1)
  • scikit-learn (>= 1.3.2)
  • joblib (>= 1.3.2)
  • plotly (>= 5.15.0)

Key Concepts

Since the development of modern portfolio theory by Markowitz (1952), mean-variance optimization (MVO) has received considerable attention.

Unfortunately, it faces a number of shortcomings, including high sensitivity to the input parameters (expected returns and covariance), weight concentration, high turnover, and poor out-of-sample performance.

It is well known that naive allocation (1/N, inverse-vol, etc.) tends to outperform MVO out-of-sample (DeMiguel, 2007).

Numerous approaches have been developed to alleviate these shortcomings (shrinkage, additional constraints, regularization, uncertainty set, higher moments, Bayesian approaches, coherent risk measures, left-tail risk optimization, distributionally robust optimization, factor model, risk-parity, hierarchical clustering, ensemble methods, pre-selection, etc.).

With this large number of methods, added to the fact that they can be composed together, there is a need for a unified framework with a machine learning approach to perform model selection, validation, and parameter tuning while reducing the risk of data leakage and overfitting.

This framework is built on scikit-learn's API.

Available models

  • Portfolio Optimization:
    • Naive:
      • Equal-Weighted
      • Inverse-Volatility
      • Random (Dirichlet)
    • Convex:
      • Mean-Risk
      • Risk Budgeting
      • Maximum Diversification
      • Distributionally Robust CVaR
    • Clustering:
      • Hierarchical Risk Parity
      • Hierarchical Equal Risk Contribution
      • Nested Clusters Optimization
    • Ensemble Methods:
      • Stacking Optimization
  • Expected Returns Estimator:
    • Empirical
    • Exponentially Weighted
    • Equilibrium
    • Shrinkage
  • Covariance Estimator:
    • Empirical
    • Gerber
    • Denoising
    • Detoning
    • Exponentially Weighted
    • Ledoit-Wolf
    • Oracle Approximating Shrinkage
    • Shrunk Covariance
    • Graphical Lasso CV
    • Implied Covariance
  • Distance Estimator:
    • Pearson Distance
    • Kendall Distance
    • Spearman Distance
    • Covariance Distance (based on any of the above covariance estimators)
    • Distance Correlation
    • Variation of Information
  • Prior Estimator:
    • Empirical
    • Black & Litterman
    • Factor Model
  • Uncertainty Set Estimator:
    • On Expected Returns:
      • Empirical
      • Circular Bootstrap
    • On Covariance:
      • Empirical
      • Circular bootstrap
  • Pre-Selection Transformer:
    • Non-Dominated Selection
    • Select K Extremes (Best or Worst)
    • Drop Highly Correlated Assets
  • Cross-Validation and Model Selection:
    • Compatible with all sklearn methods (KFold, etc.)
    • Walk Forward
    • Combinatorial Purged Cross-Validation
  • Hyper-Parameter Tuning:
    • Compatible with all sklearn methods (GridSearchCV, RandomizedSearchCV)
  • Risk Measures:
    • Variance
    • Semi-Variance
    • Mean Absolute Deviation
    • First Lower Partial Moment
    • CVaR (Conditional Value at Risk)
    • EVaR (Entropic Value at Risk)
    • Worst Realization
    • CDaR (Conditional Drawdown at Risk)
    • Maximum Drawdown
    • Average Drawdown
    • EDaR (Entropic Drawdown at Risk)
    • Ulcer Index
    • Gini Mean Difference
    • Value at Risk
    • Drawdown at Risk
    • Entropic Risk Measure
    • Fourth Central Moment
    • Fourth Lower Partial Moment
    • Skew
    • Kurtosis
  • Optimization Features:
    • Minimize Risk
    • Maximize Returns
    • Maximize Utility
    • Maximize Ratio
    • Transaction Costs
    • Management Fees
    • L1 and L2 Regularization
    • Weight Constraints
    • Group Constraints
    • Budget Constraints
    • Tracking Error Constraints
    • Turnover Constraints

Quickstart

The code snippets below are designed to introduce the functionality of skfolio so you can start using it quickly. It follows the same API as scikit-learn.

Imports

from sklearn import set_config
from sklearn.model_selection import (
    GridSearchCV,
    KFold,
    RandomizedSearchCV,
    train_test_split,
)
from sklearn.pipeline import Pipeline
from scipy.stats import loguniform

from skfolio import RatioMeasure, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.model_selection import (
    CombinatorialPurgedCV,
    WalkForward,
    cross_val_predict,
)
from skfolio.moments import (
    DenoiseCovariance,
    DetoneCovariance,
    EWMu,
    GerberCovariance,
    ShrunkMu,
)
from skfolio.optimization import (
    MeanRisk,
    NestedClustersOptimization,
    ObjectiveFunction,
    RiskBudgeting,
)
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, EmpiricalPrior, FactorModel
from skfolio.uncertainty_set import BootstrapMuUncertaintySet

Load Dataset

prices = load_sp500_dataset()

Train/Test split

X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)

Minimum Variance

model = MeanRisk()

Fit on Training Set

model.fit(X_train)

print(model.weights_)

Predict on Test Set

portfolio = model.predict(X_test)

print(portfolio.annualized_sharpe_ratio)
print(portfolio.summary())

Maximum Sortino Ratio

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.SEMI_VARIANCE,
)

Denoised Covariance & Shrunk Expected Returns

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=EmpiricalPrior(
        mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
    ),
)

Uncertainty Set on Expected Returns

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    mu_uncertainty_set_estimator=BootstrapMuUncertaintySet(),
)

Weight Constraints & Transaction Costs

model = MeanRisk(
    min_weights={"AAPL": 0.10, "JPM": 0.05},
    max_weights=0.8,
    transaction_costs={"AAPL": 0.0001, "RRC": 0.0002},
    groups=[
        ["Equity"] * 3 + ["Fund"] * 5 + ["Bond"] * 12,
        ["US"] * 2 + ["Europe"] * 8 + ["Japan"] * 10,
    ],
    linear_constraints=[
        "Equity <= 0.5 * Bond",
        "US >= 0.1",
        "Europe >= 0.5 * Fund",
        "Japan <= 1",
    ],
)
model.fit(X_train)

Risk Parity on CVaR

model = RiskBudgeting(risk_measure=RiskMeasure.CVAR)

Risk Parity & Gerber Covariance

model = RiskBudgeting(
    prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance())
)

Nested Cluster Optimization with Cross-Validation and Parallelization

model = NestedClustersOptimization(
    inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
    outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
    cv=KFold(),
    n_jobs=-1,
)

Randomized Search of the L2 Norm

randomized_search = RandomizedSearchCV(
    estimator=MeanRisk(),
    cv=WalkForward(train_size=252, test_size=60),
    param_distributions={
        "l2_coef": loguniform(1e-3, 1e-1),
    },
)
randomized_search.fit(X_train)

best_model = randomized_search.best_estimator_

print(best_model.weights_)

Grid Search on Embedded Parameters

model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    risk_measure=RiskMeasure.VARIANCE,
    prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.2)),
)

print(model.get_params(deep=True))

gs = GridSearchCV(
    estimator=model,
    cv=KFold(n_splits=5, shuffle=False),
    n_jobs=-1,
    param_grid={
        "risk_measure": [
            RiskMeasure.VARIANCE,
            RiskMeasure.CVAR,
            RiskMeasure.VARIANCE.CDAR,
        ],
        "prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
    },
)
gs.fit(X)

best_model = gs.best_estimator_

print(best_model.weights_)

Black & Litterman Model

views = ["AAPL - BBY == 0.03 ", "CVX - KO == 0.04", "MSFT == 0.06 "]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=BlackLitterman(views=views),
)

Factor Model

factor_prices = load_factors_dataset()

X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)

model = MeanRisk(prior_estimator=FactorModel())
model.fit(X_train, y_train)

print(model.weights_)

portfolio = model.predict(X_test)

print(portfolio.calmar_ratio)
print(portfolio.summary())

Factor Model & Covariance Detoning

model = MeanRisk(
    prior_estimator=FactorModel(
        factor_prior_estimator=EmpiricalPrior(covariance_estimator=DetoneCovariance())
    )
)

Black & Litterman Factor Model

factor_views = ["MTUM - QUAL == 0.03 ", "VLUE == 0.06"]
model = MeanRisk(
    objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
    prior_estimator=FactorModel(
        factor_prior_estimator=BlackLitterman(views=factor_views),
    ),
)

Pre-Selection Pipeline

set_config(transform_output="pandas")
model = Pipeline(
    [
        ("pre_selection", SelectKExtremes(k=10, highest=True)),
        ("optimization", MeanRisk()),
    ]
)
model.fit(X_train)

portfolio = model.predict(X_test)

K-fold Cross-Validation

model = MeanRisk()
mmp = cross_val_predict(model, X_test, cv=KFold(n_splits=5))
# mmp is the predicted MultiPeriodPortfolio object composed of 5 Portfolios (1 per testing fold)

mmp.plot_cumulative_returns()
print(mmp.summary())

Combinatorial Purged Cross-Validation

model = MeanRisk()

cv = CombinatorialPurgedCV(n_folds=10, n_test_folds=2)

print(cv.get_summary(X_train))

population = cross_val_predict(model, X_train, cv=cv)

population.plot_distribution(
    measure_list=[RatioMeasure.SHARPE_RATIO, RatioMeasure.SORTINO_RATIO]
)
population.plot_cumulative_returns()
print(population.summary())

Recognition

We would like to thank all contributors behind our direct dependencies, such as scikit-learn and cvxpy, but also the contributors of the following resources that were a source of inspiration:

  • PyPortfolioOpt
  • Riskfolio-Lib
  • scikit-portfolio
  • microprediction
  • statsmodels
  • rsome
  • gautier.marti.ai

Citation

If you use skfolio in a scientific publication, we would appreciate citations:

Bibtex entry:

@misc{skfolio,
  author = {Delatte, Hugo and Nicolini, Carlo},
  title = {skfolio},
  year  = {2023},
  url   = {https://github.com/skfolio/skfolio}
}