Skip to content

DataReconstructionDriftCalculator reports alert thresholds in an interchanged way #179

Closed
@BigNerd

Description

Describe the bug
The result of DataReconstructionDriftCalculator.calculate returns a data frame in result.data which, among others, has the following two columns: reconstruction_error/lower_threshold, reconstruction_error/upper_threshold.
However, the column named lower_threshold holds the upper threshold as value and vice versa.
This is due to a reconfiguration of the column index in the mentioned class' method _calculate which overwrites original column names with new multi-level column names. The new index is not consistent in terms of column order with the old one.

To Reproduce
Steps to reproduce the behavior:
import nannyml as nml
reference = nml.load_synthetic_binary_classification_dataset()[0]
analysis = nml.load_synthetic_binary_classification_dataset()[1]
feature_column_names = [
col for col in reference.columns if col not in [
'timestamp', 'y_pred_proba', 'period', 'y_pred', 'work_home_actual', 'identifier'
]]

calc = nml.DataReconstructionDriftCalculator(
column_names=feature_column_names,
timestamp_column_name='timestamp',
chunk_size=5000
)
calc.fit(reference)
result = calc.calculate(analysis)

Expected behavior
result.data["reconstruction_error"]["lower_threshold"][0] < result.data["reconstruction_error"]["upper_threshold"][0]

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions