Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fix nested_univ converter inconsistent handling of index level names #7026

Merged
merged 5 commits into from
Aug 26, 2024

Conversation

pranavvp16
Copy link
Contributor

This PR fixes the nested_univ converter, to return the index name of series nested in the dataframe.
Fixes #7025

@pranavvp16 pranavvp16 changed the title Fix nested_univ converter [BUG] Fix nested_univ converter Aug 23, 2024
@pranavvp16
Copy link
Contributor Author

pranavvp16 commented Aug 24, 2024

@fkiraly The failure in run notebooks is caused by this change, as we no longer return timepoints as name in last level. I tried checking it, the names of index returned is [None, None] and when we do reset_index it turns out to be [level_0, level_1]. The below code demonstrates the failure.

  1. load_arrow_head which is nested_univ mtype
  2. convert it to pd.multiindex where the index names are [None, None]
  3. do reset index, now the index becomes column named as level_0, level_1
  4. now when we try to set index through columns. KeyError occurs as we are trying to set timepoints which is not present
from sktime.datasets import load_arrow_head
from sktime.datatypes import convert_to

df_panel = load_arrow_head(split="TRAIN")[0]
df_panel = convert_to(df_panel, "pd-multiindex").reset_index() 
df_panel = df_panel.set_index(["level_0", "timepoints"])

Changing set_index to

df_panel = df_panel.set_index(["level_0", "level_1"])

Fixes it because now the correct column name is being returned.

@fkiraly fkiraly added module:datatypes datatypes module: data containers, checkers & converters bugfix Fixes a known bug or removes unintended behavior labels Aug 24, 2024
@fkiraly fkiraly changed the title [BUG] Fix nested_univ converter [BUG] Fix nested_univ converter inconsistent handling of index level names Aug 24, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 24, 2024

that seems like a reasonable fix - btw, I think the loader should just use the return_type argument here instead of attempting explicit conversion.

This highlights that this might break user code, although only under very brittle circumstances, so we may consider shipping it only with the full minor version.

@pranavvp16
Copy link
Contributor Author

that seems like a reasonable fix - btw, I think the loader should just use the return_type argument here instead of attempting explicit conversion.

Do you mean something like this ??

df_panel = load_arrow_head(split="TRAIN", return_type="pd-multiindex")[0]

This highlights that this might break user code, although only under very brittle circumstances, so we may consider shipping it only with the full minor version.

Is this good to go now?? so I will fix the failures in notebook too

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 25, 2024

yes, that's what I meant - the notebook should of course run with this.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, the notebook ought to run

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@pranavvp16
Copy link
Contributor Author

see above, the notebook ought to run

made changes in the notebook

@fkiraly fkiraly merged commit ab0f1f9 into sktime:main Aug 26, 2024
44 checks passed
@pranavvp16 pranavvp16 deleted the nested_univ branch September 26, 2024 12:59
benHeid pushed a commit to Z-Fran/sktime that referenced this pull request Oct 3, 2024
…l names (sktime#7026)

This PR fixes the `nested_univ` converter, to return the index name of
series nested in the dataframe.

Fixes sktime#7025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes a known bug or removes unintended behavior module:datatypes datatypes module: data containers, checkers & converters
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] nested_univ to pd.multtiindex conversion returns wrong index names
2 participants