Skip to content

Commit

Permalink
[ENH] Vendor fracdiff library (#6777)
Browse files Browse the repository at this point in the history
#### Reference Issues/PRs
Resolves #6700 core issue of
vendoring fracdiff package.


#### What does this implement/fix? Explain your changes.
The [fracdiff](https://github.com/fracdiff/fracdiff) package has been
archived which reduced the usability of sktime for some users.

This PR is the initial step in vendoring the library as a `libs`
component of sktime.
The intention at the time of initial PR is to have the tests pass with
minimal changes to the original library, however, some changes were
necessary:
- docs were removed from the original library
- examples were moved into the top-level examples directory
- `fracdiff` tensor interface was removed 
- The `FracDiffStat` class was removed (depended on `StatTester` class)
- `StatTester` class was removed, depended on `statsmodels` 
- the `FracDiffStat` seems useful, but `StatTester`, I was told, should
be re-implemented (or changed in `FracDiffStat` to some other tester of
"stationarity" (code defaulted to "ADF") that already exists in sktime.
I'm just not knowledgeable enough about what's happening here, but I can
work it out if there's an example somewhere that shows me how to use
that test and try to rewrite. YMMV.
- tests related to `StatTester` and `FracDiffStat` were then also
removed
 - The import (and structure) was altered
     - the deprecation wrappers were removed
- the `Fracdiff` class is imported to top-level from sklearn because the
tensor wrappers were removed
- the numpy `fdiff` and `fdiff_coeff`functions are also now imported to
top-level to avoid the a triple `fdiff.fdiff.fdiff`
  • Loading branch information
DinoBektesevic authored Aug 7, 2024
1 parent 580cdb1 commit 70e3b80
Show file tree
Hide file tree
Showing 21 changed files with 3,538 additions and 3 deletions.
9 changes: 9 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -2956,5 +2956,14 @@
"test"
]
}
{
"login": "DinoBektesevic",
"name": "Dino Bektesevic",
"avatar_url": "https://avatars.githubusercontent.com/u/29500910?v=4?s=100",
"profile": "https://github.com/DinoBektesevic",
"contributions": [
"code",
]
}
]
}
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,7 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
<td align="center" valign="top" width="11.11%"><a href="https://github.com/vincent-nich12"><img src="https://avatars3.githubusercontent.com/u/36476633?v=4?s=100" width="100px;" alt="vincent-nich12"/><br /><sub><b>vincent-nich12</b></sub></a><br /><a href="https://github.com/sktime/sktime/commits?author=vincent-nich12" title="Code">💻</a></td>
<td align="center" valign="top" width="11.11%"><a href="https://github.com/vollmersj"><img src="https://avatars2.githubusercontent.com/u/12613127?v=4?s=100" width="100px;" alt="vollmersj"/><br /><sub><b>vollmersj</b></sub></a><br /><a href="https://github.com/sktime/sktime/commits?author=vollmersj" title="Documentation">📖</a></td>
<td align="center" valign="top" width="11.11%"><a href="https://github.com/xiaobenbenecho"><img src="https://avatars.githubusercontent.com/u/17461849?v=4?s=100" width="100px;" alt="xiaobenbenecho"/><br /><sub><b>xiaobenbenecho</b></sub></a><br /><a href="https://github.com/sktime/sktime/commits?author=xiaobenbenecho" title="Code">💻</a></td>
<td align="center" valign="top" width="11.11%"><a href="https://github.com/DinoBektesevic"><img src="https://avatars.githubusercontent.com/u/29500910?v=4?s=100" width="100px;" alt="DinoBektesevic"/><br /><sub><b>DinoBektesevic</b></sub></a><br /><a href="https://github.com/sktime/sktime/commits?author=DinoBektesevic" title="Code">💻</a></td>
</tr>
</tbody>
</table>
Expand Down
1,233 changes: 1,233 additions & 0 deletions examples/transformation/fracdiff/example_exercise.ipynb

Large diffs are not rendered by default.

405 changes: 405 additions & 0 deletions examples/transformation/fracdiff/example_howto.ipynb

Large diffs are not rendered by default.

851 changes: 851 additions & 0 deletions examples/transformation/fracdiff/example_prado.ipynb

Large diffs are not rendered by default.

Binary file added examples/transformation/fracdiff/fig/nky.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/transformation/fracdiff/fig/spx.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 41 additions & 0 deletions examples/transformation/fracdiff/fig/spx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""Fractional differentiation of S&P 500."""

import sys

import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader
import seaborn

sys.path.append("../..")
from fracdiff import Fracdiff # noqa: E402


def fetch_spx():
"""Fetch 'Adj Close' value of Yahoo stocks."""
return pandas_datareader.data.DataReader(
"^GSPC", "yahoo", "1999-10-01", "2020-09-30"
)["Adj Close"]


if __name__ == "__main__":
s = fetch_spx()

f = Fracdiff(0.5, window=100, mode="valid")
d = f.fit_transform(s.values.reshape(-1, 1)).reshape(-1)

s = s[100 - 1 :]
d = pd.Series(d, index=s.index)

seaborn.set_style("white")
fig, ax_s = plt.subplots(figsize=(16, 8))
ax_d = ax_s.twinx()
plot_s = ax_s.plot(s, color="blue", linewidth=0.6, label="S&P 500 (left)")
plot_d = ax_d.plot(
d, color="orange", linewidth=0.6, label="S&P 500, 0.5th differentiation (right)"
)
plots = plot_s + plot_d
plt.title("S&P 500 and its fractional differentiation")
ax_s.legend(plots, [p.get_label() for p in (plots)], loc=0)
plt.savefig("spx.png", bbox_inches="tight", pad_inches=0.1)
plt.close()
13 changes: 10 additions & 3 deletions sktime/libs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,25 @@

This folder contains libraries directly distributed with, and maintained by, `sktime`.

* `fracdiff` - a package implementing fractional differentiation of time series,
a la "Advances in Financial Machine Learning" by M. Prado.
Unofficial fork of abandoned package from July 2024,
see [issue 6700](https://github.com/sktime/sktime/issues/6700).

* `pykalman` - a package implementing the Kálmán Filter and variants.
Unofficial fork of abandoned package from June 2024 onwards,
see [pykalman issue 109](https://github.com/pykalman/pykalman/issues/109).

*``vmdpy` - a package implementing Variational Mode Decomposition.
* `vmdpy` - a package implementing Variational Mode Decomposition.
Official fork, `vmdpy` is maintained in `sktime` since August 2023.



# Snippets from other libraries:

This folder contains also some snippets from other libraries
This folder contains also some private snippets from other libraries,
in folders starting with underscore. These should not be accessed by users of `sktime` directly.

* Parts of the `EnbPI` class from aws-fortuna.
* `_aws_fortuna-enbpi` - Parts of the `EnbPI` class from aws-fortuna.
The installation of the original package is not working due to dependency
mismatches.
204 changes: 204 additions & 0 deletions sktime/libs/fracdiff/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Fracdiff: Super-fast Fractional Differentiation

[![python versions](https://img.shields.io/pypi/pyversions/fracdiff.svg)](https://pypi.org/project/fracdiff)
[![version](https://img.shields.io/pypi/v/fracdiff.svg)](https://pypi.org/project/fracdiff)
[![CI](https://github.com/fracdiff/fracdiff/actions/workflows/ci.yml/badge.svg)](https://github.com/fracdiff/fracdiff/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/fracdiff/fracdiff/branch/main/graph/badge.svg)](https://codecov.io/gh/fracdiff/fracdiff)
[![dl](https://img.shields.io/pypi/dm/fracdiff)](https://pypi.org/project/fracdiff)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[Documentation](https://fracdiff.github.io/fracdiff/)

***Fracdiff*** performs fractional differentiation of time-series,
a la "Advances in Financial Machine Learning" by M. Prado.
Fractional differentiation processes time-series to a stationary one while preserving memory in the original time-series.
Fracdiff features super-fast computation and scikit-learn compatible API.

![spx](./examples/fig/spx.png)

## What is fractional differentiation?

See [M. L. Prado, "Advances in Financial Machine Learning"][prado].

## Installation

```sh
pip install fracdiff
```

## Features

### Functionalities

- [`fdiff`][doc-fdiff]: A function that extends [`numpy.diff`](https://numpy.org/doc/stable/reference/generated/numpy.diff.html) to fractional differentiation.
- [`sklearn.Fracdiff`][doc-sklearn.Fracdiff]: A scikit-learn [transformer](https://scikit-learn.org/stable/modules/generated/sklearn.base.TransformerMixin.html) to compute fractional differentiation.
- [`sklearn.FracdiffStat`][doc-sklearn.FracdiffStat]: `Fracdiff` plus automatic choice of differentiation order that makes time-series stationary.
- [`torch.fdiff`][doc-torch.fdiff]: A functional that extends [`torch.diff`](https://pytorch.org/docs/stable/generated/torch.diff.html) to fractional differentiation.
- [`torch.Fracdiff`][doc-torch.Fracdiff]: A module that computes fractional differentiation.

[doc-fdiff]: https://fracdiff.github.io/fracdiff/generated/fracdiff.fdiff.html
[doc-sklearn.Fracdiff]: https://fracdiff.github.io/fracdiff/generated/fracdiff.sklearn.Fracdiff.html
[doc-sklearn.FracdiffStat]: https://fracdiff.github.io/fracdiff/generated/fracdiff.sklearn.FracdiffStat.html
[doc-torch.fdiff]: https://fracdiff.github.io/fracdiff/generated/fracdiff.torch.fdiff.html
[doc-torch.Fracdiff]: https://fracdiff.github.io/fracdiff/generated/fracdiff.torch.Fracdiff.html

### Speed

Fracdiff is blazingly fast.

The following graphs show that *Fracdiff* computes fractional differentiation much faster than the "official" implementation.

It is especially noteworthy that execution time does not increase significantly as the number of time-steps (`n_samples`) increases, thanks to NumPy engine.

![time](https://user-images.githubusercontent.com/24503967/128821902-d38c2f46-989c-44e7-bd71-899f95553696.png)

The following tables of execution times (in unit of ms) show that *Fracdiff* can be ~10000 times faster than the "official" implementation.

| n_samples | fracdiff | official |
|------------:|:----------------|:--------------------|
| 100 | 0.675 +-0.086 | 20.008 +-1.472 |
| 1000 | 5.081 +-0.426 | 135.613 +-3.415 |
| 10000 | 50.644 +-0.574 | 1310.033 +-17.708 |
| 100000 | 519.969 +-8.166 | 13113.457 +-105.274 |

| n_features | fracdiff | official |
|-------------:|:---------------|:---------------------|
| 1 | 5.081 +-0.426 | 135.613 +-3.415 |
| 10 | 6.146 +-0.247 | 1350.161 +-15.195 |
| 100 | 6.903 +-0.654 | 13675.023 +-193.960 |
| 1000 | 13.783 +-0.700 | 136610.030 +-540.572 |

(Run on Ubuntu 20.04, Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz. See [fracdiff/benchmark](https://github.com/fracdiff/benchmark/releases/tag/1115171075) for details.)

## How to use

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fracdiff/fracdiff/blob/main/examples/example_howto.ipynb)

### Fractional differentiation

A function [`fdiff`](https://fracdiff.github.io/fracdiff/#fdiff) calculates fractional differentiation.
This is an extension of `numpy.diff` to a fractional order.

```python
import numpy as np
from fracdiff import fdiff

a = np.array([1, 2, 4, 7, 0])
fdiff(a, 0.5)
# array([ 1. , 1.5 , 2.875 , 4.6875 , -4.1640625])
np.array_equal(fdiff(a, n=1), np.diff(a, n=1))
# True

a = np.array([[1, 3, 6, 10], [0, 5, 6, 8]])
fdiff(a, 0.5, axis=0)
# array([[ 1. , 3. , 6. , 10. ],
# [-0.5, 3.5, 3. , 3. ]])
fdiff(a, 0.5, axis=-1)
# array([[1. , 2.5 , 4.375 , 6.5625],
# [0. , 5. , 3.5 , 4.375 ]])
```

### Scikit-learn API

#### Preprocessing by fractional differentiation

A transformer class [`Fracdiff`](https://fracdiff.github.io/fracdiff/#id1) performs fractional differentiation by its method `transform`.

```python
from fracdiff.sklearn import Fracdiff

X = ... # 2d time-series with shape (n_samples, n_features)

f = Fracdiff(0.5)
X = f.fit_transform(X)
```

For example, 0.5th differentiation of S&P 500 historical price looks like this:

![spx](./examples/fig/spx.png)

[`Fracdiff`](https://fracdiff.github.io/fracdiff/#id1) is compatible with scikit-learn API.
One can imcorporate it into a pipeline.

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

X, y = ... # Dataset

pipeline = Pipeline([
('scaler', StandardScaler()),
('fracdiff', Fracdiff(0.5)),
('regressor', LinearRegression()),
])
pipeline.fit(X, y)
```

#### Fractional differentiation while preserving memory

A transformer class [`FracdiffStat`](https://fracdiff.github.io/fracdiff/#fracdiffstat) finds the minumum order of fractional differentiation that makes time-series stationary.
Differentiated time-series with this order is obtained by subsequently applying `transform` method.
This series is interpreted as a stationary time-series keeping the maximum memory of the original time-series.

```python
from fracdiff.sklearn import FracdiffStat

X = ... # 2d time-series with shape (n_samples, n_features)

f = FracdiffStat()
X = f.fit_transform(X)
f.d_
# array([0.71875 , 0.609375, 0.515625])
```

The result for Nikkei 225 index historical price looks like this:

![nky](./examples/fig/nky.png)


### PyTorch API

One can fracdiff a PyTorch tensor. One can enjoy strong GPU acceleration.

```py
from fracdiff.torch import fdiff

input = torch.tensor(...)
output = fdiff(input, 0.5)
```

```py
from fracdiff.torch import Fracdiff

module = Fracdiff(0.5)
module
# Fracdiff(0.5, dim=-1, window=10, mode='same')

input = torch.tensor(...)
output = module(input)
```

### More Examples

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fracdiff/fracdiff/blob/main/examples/example_prado.ipynb)

More examples are provided [here](examples/example_prado.ipynb).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fracdiff/fracdiff/blob/main/examples/example_exercise.ipynb)

Example solutions of exercises in Section 5 of "Advances in Financial Machine Learning" are provided [here](examples/example_exercise.ipynb).

## Contributing

Any contributions are more than welcome.

The maintainer (simaki) is not making further enhancements and appreciates pull requests to make them.
See [Issue](https://github.com/fracdiff/fracdiff/issues) for proposed features.
Please take a look at [CONTRIBUTING.md](.github/CONTRIBUTING.md) before creating a pull request.

## References

- [Marcos Lopez de Prado, "Advances in Financial Machine Learning", Wiley, (2018).][prado]

[prado]: https://www.wiley.com/en-us/Advances+in+Financial+Machine+Learning-p-9781119482086
44 changes: 44 additions & 0 deletions sktime/libs/fracdiff/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""Fractional difference , a la "Advances in Financial Machine Learning" by M. Prado.
Unofficial fork of the ``fracdiff`` package, maintained in ``sktime``.
sktime migration: 2024, July (DinoBektesevic)
Version 0.9.1 release: 2023, Feb 4 (simaki)
Original authors: Shota Imaki
The 2023 release subject to following license:
BSD 3-Clause License
Copyright (c) 2021, fracdiff
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""

from sktime.libs.fracdiff.fdiff import fdiff, fdiff_coef
from sktime.libs.fracdiff.sklearn.fracdiff import Fracdiff
Loading

0 comments on commit 70e3b80

Please sign in to comment.