Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Refactored and improved Catch22 transformer - support for column names, short aliases, refactor to pd.Series, sktime native parallelization #6002

Merged
merged 79 commits into from
Mar 5, 2024
Merged
Changes from 1 commit
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
e251486
simply feature names selection
julnow Feb 25, 2024
b8cdde5
implement n_jobs ==-1 to use all cores
julnow Feb 25, 2024
f6184cf
replace `list` with `List` in typing
julnow Feb 25, 2024
a491c23
remove unnecesary empty lines
julnow Feb 25, 2024
c5fc5de
fix linting
julnow Feb 26, 2024
10aa0e2
Merge branch 'sktime:main' into main
julnow Feb 27, 2024
1f04b23
remove unimplemented function
julnow Feb 27, 2024
1afcfac
pass variables through dict
julnow Feb 27, 2024
0a13fc2
Remove loops
julnow Feb 27, 2024
7faaec2
implement short feature names
julnow Feb 27, 2024
c8b14c4
fixed linting
julnow Feb 27, 2024
2e26bb9
remove case match for compatibility with python 3.9
julnow Feb 27, 2024
e520dfe
fix annotation
julnow Feb 27, 2024
c987876
pass a dict to _get_feature_function
julnow Feb 27, 2024
5757ea8
fix passing dict
julnow Feb 27, 2024
40706a4
Seperate output col names logic
julnow Feb 27, 2024
9a2dfbb
fix annotation
julnow Feb 27, 2024
a6eadad
change type of dict to numba-native
julnow Feb 27, 2024
7129ed3
isort fix
julnow Feb 27, 2024
006995a
fix `module numba not found`
julnow Feb 27, 2024
a7a363c
lazdy numba imports
julnow Feb 28, 2024
b098905
Merge branch 'sktime:main' into main
julnow Feb 28, 2024
1376d04
move creating numba dict to numba submodule
julnow Feb 28, 2024
76532db
move creating numba dict to numba submodule
julnow Feb 28, 2024
927c971
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
3f5e7e8
soft dependency isolation of numba types and Dict
julnow Feb 28, 2024
b9d5b2d
np.fft.fft isn't njit compatible
julnow Feb 28, 2024
1b04ebc
bring back _transform_features logic
julnow Feb 28, 2024
ff069eb
update example to include column names
julnow Feb 28, 2024
ff78ae1
update example to include column names
julnow Feb 28, 2024
b4e974d
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
e4add72
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
470428e
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
7136e3b
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
c1d08a3
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
40a779e
include output in the example notebook
julnow Feb 28, 2024
dba0db6
add more accurate examples in the notebook
julnow Feb 28, 2024
94c963d
add _transform_single_feature back
julnow Feb 28, 2024
a6115a0
dont transform X to np if is already ndarray
julnow Feb 28, 2024
aaa0567
rerun notebook
julnow Feb 28, 2024
3372d47
fix checking instance
julnow Feb 28, 2024
52c0fb0
drop nopython mode from `_multiply_complex_arr`
julnow Feb 28, 2024
c64433a
drop nopython mode from `_multiply_complex_arr`
julnow Feb 28, 2024
61df8ac
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
a518335
revert zeros_like to zeros in ac
julnow Feb 29, 2024
2da0649
added full compatibility for _transform_single_feature
julnow Feb 29, 2024
a3ab23c
fix getting function
julnow Feb 29, 2024
3f498d8
fix getting function
julnow Feb 29, 2024
4bea8b9
make catch22 pass
julnow Feb 29, 2024
a43799e
fix pydocstyle
julnow Feb 29, 2024
28261e2
remove redundant code
julnow Feb 29, 2024
e28c8d5
remove redundant code
julnow Feb 29, 2024
c37f568
Revert "fix pydocstyle"
julnow Feb 29, 2024
1bd4810
Delete examples/forecasting_results.csv
julnow Feb 29, 2024
3a7a48e
remove unnesseray files
julnow Feb 29, 2024
9019081
Delete sktime/utils/validation/__init__ copy.py
julnow Feb 29, 2024
a1549b6
remove unnecessary files
julnow Feb 29, 2024
e120b8e
Delete sktime/transformations/panel/test.ipynb
julnow Feb 29, 2024
00afd0d
remove excess changes
julnow Feb 29, 2024
7c99c4a
bring back `_FC_LocalSimple_mean1_tauresrat`
julnow Feb 29, 2024
aa10635
pydocstyle
julnow Feb 29, 2024
c07e52e
pydocstyle
julnow Feb 29, 2024
fb94365
Merge branch 'sktime:main' into main
julnow Feb 29, 2024
e72548e
Merge branch 'sktime:main' into main
julnow Mar 1, 2024
af650ce
[AUTOMATED] update CONTRIBUTORS.md
julnow Mar 1, 2024
a116f98
remove duplicated short feature name
julnow Mar 1, 2024
e00efc5
accept short_feature names as feature arguments
julnow Mar 1, 2024
1daf37d
more feature tests
julnow Mar 1, 2024
9096649
auto col names
julnow Mar 3, 2024
438f1d2
restore tags packaging header
fkiraly Mar 3, 2024
15f1f30
Merge branch 'main' of https://github.com/julnow/sktime into pr/6002
fkiraly Mar 3, 2024
c6a3181
Merge branch 'main' into pr/6002
fkiraly Mar 3, 2024
9b23f9a
add catch24 features names
julnow Mar 4, 2024
fcf8894
Update __init__.py
julnow Mar 4, 2024
14d2936
add outlier_norm desc
julnow Mar 4, 2024
f47adbc
remove trailing space
julnow Mar 4, 2024
74fcf14
unify catch24 feature names
julnow Mar 5, 2024
d7a108a
n_jobs arg should remain in its old position
julnow Mar 5, 2024
3b9b636
add more tests
julnow Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove case match for compatibility with python 3.9
  • Loading branch information
julnow committed Feb 27, 2024
commit 2e26bb9064d451c033cd9aef9b0f0c2f29accc01
93 changes: 58 additions & 35 deletions sktime/transformations/panel/catch22.py
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,12 @@ def __init__(
self.catch24 = catch24
self.outlier_norm = outlier_norm
self.replace_nans = replace_nans
self.col_names = col_names
self.n_jobs = n_jobs
self.col_names = self._set_col_names(col_names)
self.f_idx = _verify_features(self.features, self.catch24)
super().__init__()

self.n_jobs = n_jobs

# todo 0.28.0: remove this warning and logic
if n_jobs != "deprecated":
warn(
Expand All @@ -238,6 +239,30 @@ def __init__(
)
self.set_config(backend="joblib", backend_params={"n_jobs": n_jobs})

def _set_col_names(self, col_names: str) -> str:
"""Checks and returns col_names if one of:
["range", "int_feat", "str_feat", "short_str_feat"].

Parameters
----------
col_names : str with type of desired col_names

Returns
-------
col_names string which should be one of acceptable types.

Raises
-------
KeyError if not in accepted col_names types.
"""
accepted_col_names = ["range", "int_feat", "str_feat", "short_str_feat"]
if col_names in accepted_col_names:
return col_names
else:
raise KeyError(
f"col_names type: {col_names} must be one of {accepted_col_names}"
)

def _transform(self, X: pd.Series, y=None) -> pd.DataFrame:
"""Transform data into the Catch22 features.

Expand All @@ -258,20 +283,19 @@ def _transform(self, X: pd.Series, y=None) -> pd.DataFrame:
return Xt

def _get_feature_function(self, feature: Union[int, str]):
match feature:
case int():
return (
METHODS_DICT.get(FEATURE_NAMES[feature])
if feature < 22
else CATCH24_METHODS_DICT.get(CATCH24_FEATURE_NAMES[feature - 22])
)
case str():
if feature in FEATURE_NAMES:
return METHODS_DICT.get(feature)
if feature in CATCH24_FEATURE_NAMES:
return CATCH24_METHODS_DICT.get(feature)
case _:
raise KeyError(f"No feature with name: {feature}")
if isinstance(feature, int):
return (
METHODS_DICT.get(FEATURE_NAMES[feature])
if feature < 22
else CATCH24_METHODS_DICT.get(CATCH24_FEATURE_NAMES[feature - 22])
)
elif isinstance(feature, str):
if feature in FEATURE_NAMES:
return METHODS_DICT.get(feature)
if feature in CATCH24_FEATURE_NAMES:
return CATCH24_METHODS_DICT.get(feature)
else:
raise KeyError(f"No feature with name: {feature}")

def _transform_case(self, X: pd.Series, f_idx: List[int]) -> pd.DataFrame:
"""Transform data into the Catch22/24 features.
Expand Down Expand Up @@ -315,25 +339,24 @@ def _transform_case(self, X: pd.Series, f_idx: List[int]) -> pd.DataFrame:
}
col_names = self.col_names

match col_names:
case "range":
cols = range(n_features)
case "int_feat":
cols = f_idx
case "str_feat":
all_feature_names = (
FEATURE_NAMES + CATCH24_FEATURE_NAMES
if self.catch24
else FEATURE_NAMES
)
cols = [all_feature_names[i] for i in f_idx]
case "short_str_feat":
all_short_feature_names = (
SHORT_FEATURE_NAMES + CATCH24_SHORT_FEATURE_NAMES
if self.catch24
else SHORT_FEATURE_NAMES
)
cols = [all_short_feature_names[i] for i in f_idx]
if col_names == "range":
cols = range(n_features)
elif col_names == "int_feat":
cols = f_idx
elif col_names == "str_feat":
all_feature_names = (
FEATURE_NAMES + CATCH24_FEATURE_NAMES if self.catch24 else FEATURE_NAMES
)
cols = [all_feature_names[i] for i in f_idx]
elif col_names == "short_str_feat":
all_short_feature_names = (
SHORT_FEATURE_NAMES + CATCH24_SHORT_FEATURE_NAMES
if self.catch24
else SHORT_FEATURE_NAMES
)
cols = [all_short_feature_names[i] for i in f_idx]
else:
raise KeyError(f"Incorrect col_names type: {col_names}")

for n, feature in enumerate(f_idx):
Xt_np[0, n] = self._get_feature_function(feature)(variable_dict)
Expand Down
Loading