-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: improve MultiIndex get_loc performance #16346
Changes from 1 commit
3d3074b
d1d1513
664d2b3
7cd3cc1
3bd0404
287817a
638f011
8acc9e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,6 @@ | |
|
||
import numpy as np | ||
from pandas._libs import hashing | ||
from pandas.compat import string_and_binary_types, text_type | ||
from pandas.core.dtypes.generic import ( | ||
ABCMultiIndex, | ||
ABCIndexClass, | ||
|
@@ -14,6 +13,7 @@ | |
from pandas.core.dtypes.common import ( | ||
is_categorical_dtype, is_list_like) | ||
from pandas.core.dtypes.missing import isnull | ||
from pandas.core.dtypes.cast import infer_dtype_from_scalar | ||
|
||
|
||
# 16 byte long hashing key | ||
|
@@ -317,20 +317,8 @@ def _hash_scalar(val, encoding='utf8', hash_key=None): | |
# this is to be consistent with the _hash_categorical implementation | ||
return np.array([np.iinfo(np.uint64).max], dtype='u8') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you need to handle datetime w/tz directly (IOW, we basically ignore it), then I would.
I suppose an option to ignore tz is fine for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I can certainly do that check here as well. That is maybe better to keep the custom logic here, as the keyword added to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. he i think better locally |
||
if isinstance(val, string_and_binary_types + (text_type,)): | ||
vals = np.array([val], dtype=object) | ||
else: | ||
vals = np.array([val]) | ||
|
||
if vals.dtype == np.object_: | ||
from pandas import Timestamp, Timedelta, Period, Interval | ||
if isinstance(val, (Timestamp, Timedelta)): | ||
vals = np.array([val.value]) | ||
elif isinstance(val, (Period, Interval)): | ||
pass | ||
else: | ||
from pandas import Index | ||
vals = Index(vals).values | ||
dtype, val = infer_dtype_from_scalar(val, use_datetimetz=False) | ||
vals = np.array([val], dtype=dtype) | ||
|
||
return hash_array(vals, hash_key=hash_key, encoding=encoding, | ||
categorize=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this extra parameter, instead you can pass
pandas_dtype=True
. what is the issue?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas_dtype=True
does not return a np.datetime64 but our DatetimeTZDtype, and further then also Periods get converted, which I also don't need.So the combination I need, i.e. tz-aware timestamps to its numpy dtype instead of as pandas extension type or as object, but keep Periods as objects, is not possible with the current options. This is a bit a strange combination, but that's the consequence of how those are returned from the Index values (which is as datetime64 without tz but Periods as objects), which is how they are hashed.
But indeed, if we add this,
ignore_tz
is probably a better name