-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TimeSeriesSVR input causes nan errors #440
Comments
I ran into this issue as well. Thanks for reporting this bug. I found that the issue arises from nan values being generated in the kernel 'gak' but the error messages said that unsupported values were in 'Input X', which is confusing to users. Fixes needed for 'gak' itself and for the confusing error message. My example used:
Gives first error in trace:
Last error in trace:
Changing the kernel to "rbf" results in no error.
|
I get a similar Error in TimeSeriesSVC. Same as @lomnes-atlast-food I pinpointed the kernel to be the root of the problem. However I do think that this somewhat expected from an "allignment" kernel. |
Describe the bug
Some inputs fail when fitting and give the nan error when no nans exist in the data. I provide a minimal example here which gives the same error as my larger dataset but may/may not be the same root cause. Note that my larger dataset is trimmed to avoid the 405 datapoint limit reported elsewhere.
To Reproduce
The following minimal example gives the error (see below)
from tslearn.svm import TimeSeriesSVR
from tslearn.utils import to_time_series_dataset
import numpy as np
X = to_time_series_dataset([ np.ones(3), np.ones(3)*2, np.ones(3)])
y = [0, 3, 0.1]
clf = TimeSeriesSVR(C=1., kernel="gak")
clf.fit(X, y)
Expected behavior
Not to fail but to actually fit, or give a better error message.
Environment (please complete the following information):
Additional context
Here is the error code:
5 y = [0, 3, 0.1]
6 clf = TimeSeriesSVR(C=1., kernel="gak")
----> 7 clf.fit(X, y)
File ~\PyVenvs\test2\lib\site-packages\tslearn\svm\svm.py:552, in TimeSeriesSVR.fit(self, X, y, sample_weight)
544 sklearn_X, y = self.preprocess_sklearn(X, y, fit_time=True)
546 self.svm_estimator = SVR(
547 C=self.C, kernel=self.estimator_kernel_, degree=self.degree,
548 gamma=self.gamma_, coef0=self.coef0, shrinking=self.shrinking,
549 tol=self.tol, cache_size=self.cache_size,
550 verbose=self.verbose, max_iter=self.max_iter
551 )
--> 552 self.svm_estimator_.fit(sklearn_X, y, sample_weight=sample_weight)
553 return self
File ~\PyVenvs\test2\lib\site-packages\sklearn\svm_base.py:192, in BaseLibSVM.fit(self, X, y, sample_weight)
190 check_consistent_length(X, y)
191 else:
--> 192 X, y = self._validate_data(
193 X,
194 y,
195 dtype=np.float64,
196 order="C",
197 accept_sparse="csr",
198 accept_large_sparse=False,
199 )
201 y = self._validate_targets(y)
203 sample_weight = np.asarray(
204 [] if sample_weight is None else sample_weight, dtype=np.float64
205 )
File ~\PyVenvs\test2\lib\site-packages\sklearn\base.py:554, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
552 y = check_array(y, input_name="y", **check_y_params)
553 else:
--> 554 X, y = check_X_y(X, y, **check_params)
555 out = X, y
557 if not no_val_X and check_params.get("ensure_2d", True):
File ~\PyVenvs\test2\lib\site-packages\sklearn\utils\validation.py:1104, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
1099 estimator_name = _check_estimator_name(estimator)
1100 raise ValueError(
1101 f"{estimator_name} requires y to be passed, but the target y is None"
1102 )
-> 1104 X = check_array(
1105 X,
1106 accept_sparse=accept_sparse,
1107 accept_large_sparse=accept_large_sparse,
1108 dtype=dtype,
1109 order=order,
1110 copy=copy,
1111 force_all_finite=force_all_finite,
1112 ensure_2d=ensure_2d,
1113 allow_nd=allow_nd,
1114 ensure_min_samples=ensure_min_samples,
1115 ensure_min_features=ensure_min_features,
1116 estimator=estimator,
1117 input_name="X",
1118 )
1120 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
1122 check_consistent_length(X, y)
File ~\PyVenvs\test2\lib\site-packages\sklearn\utils\validation.py:919, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
913 raise ValueError(
914 "Found array with dim %d. %s expected <= 2."
915 % (array.ndim, estimator_name)
916 )
918 if force_all_finite:
--> 919 _assert_all_finite(
920 array,
921 input_name=input_name,
922 estimator_name=estimator_name,
923 allow_nan=force_all_finite == "allow-nan",
924 )
926 if ensure_min_samples > 0:
927 n_samples = _num_samples(array)
File ~\PyVenvs\test2\lib\site-packages\sklearn\utils\validation.py:161, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
144 if estimator_name and input_name == "X" and has_nan_error:
145 # Improve the error message on how to handle missing values in
146 # scikit-learn.
147 msg_err += (
148 f"\n{estimator_name} does not accept missing values"
149 " encoded as NaN natively. For supervised learning, you might want"
(...)
159 "#estimators-that-handle-nan-values"
160 )
--> 161 raise ValueError(msg_err)
ValueError: Input X contains NaN.
SVR does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values
The text was updated successfully, but these errors were encountered: