Skip to content

Commit

Permalink
[ENH] homogenization of sktime and skchange detection API - predict
Browse files Browse the repository at this point in the history
…, `predict_points`, `predict_segments` (#7476)

This PR carries out some remaining steps of interface homogenization
with `skchange`:

* index of returns of `predict`, `predict_points`, `predict_segments` is
now always a `RangeIndex` - no key information requires index
manipulation anymore.
* return of `predict`, `predict_points`, `predict_segments` is now
always a `pd.DataFrame`, with one column `"ilocs"` for the event
indices, and additional potential columns in supervised and
semi-supervised cases
* return of `predict` is now identical with one of `predict_points` or
`predict_segments` in the supervised and semi-supervised case

Also updates some estimators to function as expected:

* existing estimators through coercions to the desired output
* dummy estimators
  • Loading branch information
fkiraly authored Dec 3, 2024
1 parent 1d84df2 commit 9bf1324
Show file tree
Hide file tree
Showing 15 changed files with 326 additions and 80 deletions.
4 changes: 2 additions & 2 deletions examples/07_detection_anomaly_changepoints.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -870,7 +870,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"id": "6fcf2dbd-0afb-4d11-b238-84294e811dba",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -905,7 +905,7 @@
" linestyle=\"--\",\n",
")\n",
"\n",
"for i, cp in enumerate(predicted_change_points):\n",
"for i, cp in enumerate(predicted_change_points.values.flatten()):\n",
" label = \"Predicted Change Points\" if i == 0 else None\n",
" ax.axvline(cp, color=\"tab:green\", linestyle=\"--\", label=label)\n",
"\n",
Expand Down
50 changes: 43 additions & 7 deletions extension_templates/detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,29 @@ def _fit(self, X, y=None):
----------
X : pd.DataFrame
Training data to fit model to time series.
y : pd.Series, optional
Ground truth labels for training, if detector is supervised.
y : pd.DataFrame with RangeIndex
Known events for training, in ``X``, if detector is supervised.
Each row ``y`` is a known event.
Can have the following columns:
* ``"ilocs"`` - always. Values encode where/when the event takes place,
via ``iloc`` references to indices of ``X``,
or ranges ot indices of ``X``, as below.
* ``"label"`` - if the task, by tags, is supervised or semi-supervised
segmentation with labels, or segment clustering.
The meaning of entries in the ``"ilocs"`` column and ``"labels"``
column describe the event in a given row as follows:
* If ``task`` is ``"anomaly_detection"`` or ``"change_point_detection"``,
``"ilocs"`` contains the iloc index at which the event takes place.
* If ``task`` is ``"segmentation"``, ``"ilocs"`` contains left-closed
intervals of iloc based segments, interpreted as the range
of indices over which the event takes place.
Labels (if present) in the ``"labels"`` column indicate the type of event.
Returns
-------
Expand Down Expand Up @@ -221,13 +242,28 @@ def _predict(self, X):
Returns
-------
y : pd.Series with RangeIndex
Labels for sequence ``X``, in sparse format.
Values are ``iloc`` references to indices of ``X``.
y : pd.DataFrame with RangeIndex
Detected or predicted events.
Each row ``y`` is a detected or predicted event.
Can have the following columns:
* ``"ilocs"`` - always. Values encode where/when the event takes place,
via ``iloc`` references to indices of ``X``,
or ranges ot indices of ``X``, as below.
* ``"label"`` - if the task, by tags, is supervised or semi-supervised
segmentation with labels, or segment clustering.
The meaning of entries in the ``"ilocs"`` column and ``"labels"``
column describe the event in a given row as follows:
* If ``task`` is ``"anomaly_detection"`` or ``"change_point_detection"``,
the values are integer indices of the changepoints/anomalies.
* If ``task`` is "segmentation", the values are ``pd.Interval`` objects.
``"ilocs"`` contains the iloc index at which the event takes place.
* If ``task`` is ``"segmentation"``, ``"ilocs"`` contains left-closed
intervals of iloc based segments, interpreted as the range
of indices over which the event takes place.
Labels (if present) in the ``"labels"`` column indicate the type of event.
"""

# implement here
Expand Down
2 changes: 1 addition & 1 deletion sktime/detection/adapters/_pyod.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ def _predict(self, X):
Y_val_np = Y_np

Y_loc = np.where(Y_np)
Y = pd.Series(Y_val_np[Y_loc], index=X.index[Y_loc])
Y = pd.Series(Y_val_np[Y_loc])

return Y

Expand Down
Loading

0 comments on commit 9bf1324

Please sign in to comment.