Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Refactored and improved Catch22 transformer - support for column names, short aliases, refactor to pd.Series, sktime native parallelization #6002

Merged
merged 79 commits into from
Mar 5, 2024
Merged
Changes from 1 commit
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
e251486
simply feature names selection
julnow Feb 25, 2024
b8cdde5
implement n_jobs ==-1 to use all cores
julnow Feb 25, 2024
f6184cf
replace `list` with `List` in typing
julnow Feb 25, 2024
a491c23
remove unnecesary empty lines
julnow Feb 25, 2024
c5fc5de
fix linting
julnow Feb 26, 2024
10aa0e2
Merge branch 'sktime:main' into main
julnow Feb 27, 2024
1f04b23
remove unimplemented function
julnow Feb 27, 2024
1afcfac
pass variables through dict
julnow Feb 27, 2024
0a13fc2
Remove loops
julnow Feb 27, 2024
7faaec2
implement short feature names
julnow Feb 27, 2024
c8b14c4
fixed linting
julnow Feb 27, 2024
2e26bb9
remove case match for compatibility with python 3.9
julnow Feb 27, 2024
e520dfe
fix annotation
julnow Feb 27, 2024
c987876
pass a dict to _get_feature_function
julnow Feb 27, 2024
5757ea8
fix passing dict
julnow Feb 27, 2024
40706a4
Seperate output col names logic
julnow Feb 27, 2024
9a2dfbb
fix annotation
julnow Feb 27, 2024
a6eadad
change type of dict to numba-native
julnow Feb 27, 2024
7129ed3
isort fix
julnow Feb 27, 2024
006995a
fix `module numba not found`
julnow Feb 27, 2024
a7a363c
lazdy numba imports
julnow Feb 28, 2024
b098905
Merge branch 'sktime:main' into main
julnow Feb 28, 2024
1376d04
move creating numba dict to numba submodule
julnow Feb 28, 2024
76532db
move creating numba dict to numba submodule
julnow Feb 28, 2024
927c971
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
3f5e7e8
soft dependency isolation of numba types and Dict
julnow Feb 28, 2024
b9d5b2d
np.fft.fft isn't njit compatible
julnow Feb 28, 2024
1b04ebc
bring back _transform_features logic
julnow Feb 28, 2024
ff069eb
update example to include column names
julnow Feb 28, 2024
ff78ae1
update example to include column names
julnow Feb 28, 2024
b4e974d
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
e4add72
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
470428e
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
7136e3b
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
c1d08a3
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
40a779e
include output in the example notebook
julnow Feb 28, 2024
dba0db6
add more accurate examples in the notebook
julnow Feb 28, 2024
94c963d
add _transform_single_feature back
julnow Feb 28, 2024
a6115a0
dont transform X to np if is already ndarray
julnow Feb 28, 2024
aaa0567
rerun notebook
julnow Feb 28, 2024
3372d47
fix checking instance
julnow Feb 28, 2024
52c0fb0
drop nopython mode from `_multiply_complex_arr`
julnow Feb 28, 2024
c64433a
drop nopython mode from `_multiply_complex_arr`
julnow Feb 28, 2024
61df8ac
Merge branch 'main' of https://github.com/julnow/sktime
julnow Feb 28, 2024
a518335
revert zeros_like to zeros in ac
julnow Feb 29, 2024
2da0649
added full compatibility for _transform_single_feature
julnow Feb 29, 2024
a3ab23c
fix getting function
julnow Feb 29, 2024
3f498d8
fix getting function
julnow Feb 29, 2024
4bea8b9
make catch22 pass
julnow Feb 29, 2024
a43799e
fix pydocstyle
julnow Feb 29, 2024
28261e2
remove redundant code
julnow Feb 29, 2024
e28c8d5
remove redundant code
julnow Feb 29, 2024
c37f568
Revert "fix pydocstyle"
julnow Feb 29, 2024
1bd4810
Delete examples/forecasting_results.csv
julnow Feb 29, 2024
3a7a48e
remove unnesseray files
julnow Feb 29, 2024
9019081
Delete sktime/utils/validation/__init__ copy.py
julnow Feb 29, 2024
a1549b6
remove unnecessary files
julnow Feb 29, 2024
e120b8e
Delete sktime/transformations/panel/test.ipynb
julnow Feb 29, 2024
00afd0d
remove excess changes
julnow Feb 29, 2024
7c99c4a
bring back `_FC_LocalSimple_mean1_tauresrat`
julnow Feb 29, 2024
aa10635
pydocstyle
julnow Feb 29, 2024
c07e52e
pydocstyle
julnow Feb 29, 2024
fb94365
Merge branch 'sktime:main' into main
julnow Feb 29, 2024
e72548e
Merge branch 'sktime:main' into main
julnow Mar 1, 2024
af650ce
[AUTOMATED] update CONTRIBUTORS.md
julnow Mar 1, 2024
a116f98
remove duplicated short feature name
julnow Mar 1, 2024
e00efc5
accept short_feature names as feature arguments
julnow Mar 1, 2024
1daf37d
more feature tests
julnow Mar 1, 2024
9096649
auto col names
julnow Mar 3, 2024
438f1d2
restore tags packaging header
fkiraly Mar 3, 2024
15f1f30
Merge branch 'main' of https://github.com/julnow/sktime into pr/6002
fkiraly Mar 3, 2024
c6a3181
Merge branch 'main' into pr/6002
fkiraly Mar 3, 2024
9b23f9a
add catch24 features names
julnow Mar 4, 2024
fcf8894
Update __init__.py
julnow Mar 4, 2024
14d2936
add outlier_norm desc
julnow Mar 4, 2024
f47adbc
remove trailing space
julnow Mar 4, 2024
74fcf14
unify catch24 feature names
julnow Mar 5, 2024
d7a108a
n_jobs arg should remain in its old position
julnow Mar 5, 2024
3b9b636
add more tests
julnow Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
implement short feature names
  • Loading branch information
julnow committed Feb 27, 2024
commit 7faaec27ffae08aff1b8cd493839ecc70402e9b9
66 changes: 53 additions & 13 deletions sktime/transformations/panel/catch22.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,40 @@
"SB_TransitionMatrix_3ac_sumdiagcov": _SB_TransitionMatrix_3ac_sumdiagcov,
"PD_PeriodicityWang_th0_01": _PD_PeriodicityWang_th0_01,
}
CATCH24_METHODS_DICT = {"Mean": _catch24_mean, "StandardDeviation": _catch24_std}

FEATURE_NAMES = list(METHODS_DICT.keys())
SHORT_FEATURE_NAMES_DICT = {
"DN_HistogramMode_5": "mode_5",
"DN_HistogramMode_10": "mode_10",
"SB_BinaryStats_diff_longstretch0": "stretch_decreasing",
"DN_OutlierInclude_p_001_mdrmd": "outlier_timing_pos",
"DN_OutlierInclude_n_001_mdrmd": "outlier_timing_neg",
"CO_f1ecac": "acf_timescale",
"CO_FirstMin_ac": "acf_first_min",
"SP_Summaries_welch_rect_area_5_1": "centroid_freq",
"SP_Summaries_welch_rect_centroid": "low_freq_power",
"FC_LocalSimple_mean3_stderr": "forecast_error",
"CO_trev_1_num": "trev",
"CO_HistogramAMI_even_2_5": "ami2",
"IN_AutoMutualInfoStats_40_gaussian_fmmi": "ami_timescale",
"MD_hrv_classic_pnn40": "high_fluctuation",
"SB_BinaryStats_mean_longstretch1": "stretch_high",
"SB_MotifThree_quantile_hh": "rs_range",
"FC_LocalSimple_mean1_tauresrat": "whiten_timescale",
"CO_Embed2_Dist_tau_d_expfit_meandiff": "embedding_dist",
"SC_FluctAnal_2_dfa_50_1_2_logi_prop_r1": "dfa",
"SC_FluctAnal_2_rsrangefit_50_1_logi_prop_r1": "rs_range",
"SB_TransitionMatrix_3ac_sumdiagcov": "transition_matrix",
"PD_PeriodicityWang_th0_01": "periodicity",
}
SHORT_FEATURE_NAMES = list(SHORT_FEATURE_NAMES_DICT.values())

CATCH24_METHODS_DICT = {"Mean": _catch24_mean, "StandardDeviation": _catch24_std}
CATCH24_FEATURE_NAMES = list(CATCH24_METHODS_DICT.keys())
CATCH24_SHORT_FEATURE_NAMES_DICT = {
"Mean": "mean",
"StandardDeviation": "std",
}
CATCH24_SHORT_FEATURE_NAMES = list(CATCH24_SHORT_FEATURE_NAMES_DICT.values())


def _verify_features(
Expand Down Expand Up @@ -142,8 +172,8 @@ class Catch22(BaseTransformer):
If "int_feat", column names will be the integer feature indices,
as defined in pycatch22.
If "str_feat", column names will be the string feature names.
# If "short_str_feat", column names will be the short string feature names
# as defined in pycatch22.
If "short_str_feat", column names will be the short string feature names
as defined in pycatch22.

See Also
--------
Expand Down Expand Up @@ -288,15 +318,25 @@ def _transform_case(self, X: pd.Series, f_idx: List[int]) -> pd.DataFrame:
}
col_names = self.col_names

if col_names == "range":
cols = range(n_features)
elif col_names == "int_feat":
cols = f_idx
elif col_names == "str_feat":
all_feature_names = (
FEATURE_NAMES + CATCH24_FEATURE_NAMES if self.catch24 else FEATURE_NAMES
)
cols = [all_feature_names[i] for i in f_idx]
match col_names:
case "range":
cols = range(n_features)
case "int_feat":
cols = f_idx
case "str_feat":
all_feature_names = (
FEATURE_NAMES + CATCH24_FEATURE_NAMES
if self.catch24
else FEATURE_NAMES
)
cols = [all_feature_names[i] for i in f_idx]
case "short_str_feat":
all_short_feature_names = (
SHORT_FEATURE_NAMES + CATCH24_SHORT_FEATURE_NAMES
if self.catch24
else SHORT_FEATURE_NAMES
)
cols = [all_short_feature_names[i] for i in f_idx]

for n, feature in enumerate(f_idx):
Xt_np[0, n] = self._get_feature_function(feature)(variable_dict)
Expand Down
Loading