Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate Error when a string is given for drift methods #165

Closed
nikml opened this issue Dec 5, 2022 · 1 comment
Closed

Inappropriate Error when a string is given for drift methods #165

nikml opened this issue Dec 5, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@nikml
Copy link
Contributor

nikml commented Dec 5, 2022

# missing code that creates relevant dataframes
feature_column_names = train.drop(['fraud_reported', 'timestamp'],axis = 1).columns

univariate_calculator = nml.UnivariateDriftCalculator(
    column_names=list(feature_column_names),
    continuous_methods='jensen_shannon')

univariate_calculator = univariate_calculator.fit(test)
univariate_results = univariate_calculator.calculate(val)

---------------------------------------------------------------------------
InvalidArgumentsException                 Traceback (most recent call last)
Cell In[23], line 7
      1 feature_column_names = train.drop(['fraud_reported', 'timestamp'],axis = 1).columns
      3 univariate_calculator = nml.UnivariateDriftCalculator(
      4     column_names=list(feature_column_names),
      5     continuous_methods='kolmogorov_smirnov')
----> 7 univariate_calculator = univariate_calculator.fit(test)
      8 univariate_results = univariate_calculator.calculate(val)
     10 # for column_name in univariate_calculator.continuous_column_names:
     11 #     figure = univariate_results.plot(
     12 #         kind='drift',
   (...)
     16 #     )
     17 #     figure.show()

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/base.py:138, in AbstractCalculator.fit(self, reference_data, *args, **kwargs)
    136 try:
    137     self._logger.debug(f"fitting {str(self)}")
--> 138     return self._fit(reference_data, *args, **kwargs)
    139 except InvalidArgumentsException:
    140     raise

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/usage_logging.py:189, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
    187 finally:
    188     if runtime_exception is not None:
--> 189         raise runtime_exception
    190     else:
    191         return res

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/usage_logging.py:142, in log_usage.<locals>.logging_decorator.<locals>.logging_wrapper(*args, **kwargs)
    139 runtime_exception, res = None, None
    140 try:
    141     # run original function
--> 142     res = func(*args, **kwargs)
    143 except BaseException as exc:
    144     runtime_exception = exc

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/calculator.py:109, in UnivariateDriftCalculator._fit(self, reference_data, *args, **kwargs)
    104 self.continuous_column_names, self.categorical_column_names = _split_features_by_type(
    105     reference_data, self.column_names
    106 )
    108 for column_name in self.continuous_column_names:
--> 109     self._column_to_models_mapping[column_name] += [
    110         MethodFactory.create(key=method, feature_type=FeatureType.CONTINUOUS, chunker=self.chunker).fit(
    111             reference_data[column_name]
    112         )
    113         for method in self.continuous_method_names
    114     ]
    116 for column_name in self.categorical_column_names:
    117     self._column_to_models_mapping[column_name] += [
    118         MethodFactory.create(key=method, feature_type=FeatureType.CATEGORICAL, chunker=self.chunker).fit(
    119             reference_data[column_name]
    120         )
    121         for method in self.categorical_method_names
    122     ]

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/calculator.py:110, in <listcomp>(.0)
    104 self.continuous_column_names, self.categorical_column_names = _split_features_by_type(
    105     reference_data, self.column_names
    106 )
    108 for column_name in self.continuous_column_names:
    109     self._column_to_models_mapping[column_name] += [
--> 110         MethodFactory.create(key=method, feature_type=FeatureType.CONTINUOUS, chunker=self.chunker).fit(
    111             reference_data[column_name]
    112         )
    113         for method in self.continuous_method_names
    114     ]
    116 for column_name in self.categorical_column_names:
    117     self._column_to_models_mapping[column_name] += [
    118         MethodFactory.create(key=method, feature_type=FeatureType.CATEGORICAL, chunker=self.chunker).fit(
    119             reference_data[column_name]
    120         )
    121         for method in self.categorical_method_names
    122     ]

File ~/.conda/envs/nanny/lib/python3.10/site-packages/nannyml/drift/univariate/methods.py:152, in MethodFactory.create(cls, key, feature_type, **kwargs)
    149     raise InvalidArgumentsException(f"cannot create method given a '{type(key)}'. Please provide a string.")
    151 if key not in cls.registry:
--> 152     raise InvalidArgumentsException(
    153         f"unknown method key '{key}' given. "
    154         "Should be one of ['kolmogorov_smirnov', 'jensen_shannon', 'wasserstein', 'chi2', "
    155         "'jensen_shannon', 'l_infinity', 'hellinger']."
    156     )
    158 if feature_type not in cls.registry[key]:
    159     raise InvalidArgumentsException(f"method {key} does not support {feature_type.value} features.")

InvalidArgumentsException: unknown method key 'k' given. Should be one of ['kolmogorov_smirnov', 'jensen_shannon', 'wasserstein', 'chi2', 'jensen_shannon', 'l_infinity', 'hellinger'].

The problem seems to be that the user incorrectly gave a string instead of a list of strings.
However instead of getting a proper error that he should give a list, he gets an error of unknown method k.

I m assuming (because I didn't dig into the code) the string is turned into a list and then the first letter is attempted to be matched against our string options.

P.S. Alternatively we can also accept a string here and handle it appropriately. Nice to have but less important than an error that correctly points out what went wrong.

@nikml nikml added bug Something isn't working triage Needs to be assessed labels Dec 5, 2022
@nnansters nnansters removed the triage Needs to be assessed label Dec 5, 2022
@nnansters
Copy link
Contributor

Hey Nikos,

your assumption is correct indeed, the string is interpreted as a list and because of that no match will be found.
We've introduced a similar behavior elsewhere, allowing for both a single string as well as a list of strings.

I suggest we do the same here and for other similar cases as well.

nnansters added a commit that referenced this issue Dec 14, 2022
…/estimator creation (#165)

Signed-off-by: niels <niels@nannyml.com>
nnansters added a commit that referenced this issue Dec 15, 2022
Signed-off-by: niels <niels@nannyml.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants