-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: MAINT: Improve performance of common_type and use result_type internally #24656
Conversation
return array_type[1][precision] | ||
else: | ||
return array_type[0][precision] | ||
common_type = _multiarray_umath.result_type(*arrays).type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this will make it a bit slower possibly, but I think I would like to:
- Use
*(a.dtype for a in arrays)
because unfortunatelyresult_type
supports both array and dtype inputs which would broad the signature here. - We can add
0.0
as an additional argumen ttoresult_type()
to enforce theinteger -> int64
transition.
Unfortunately, this will improve the situation when mixing integers and floats: float32 + int16
will actually promote to stay float32
. While here any integer forces float64 precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is to make sure people do not pass a
dtype
directly tocommon_type
? It does make the code a bit slower, but the overhead result is still ok. - We could use
common_type = _multiarray_umath.result_type(0., *arrays).type
and skip the check on integer type. The pure int goes to float64 (as in current main), but a disadvantage is that nowfloat32+float32
goes tofloat64
(where before it would befloat32
)
With
def common_type_with_zero(*arrays):
common_type = _multiarray_umath.result_type(0., *[a.dtype for a in arrays]).type
if not issubclass(common_type, _nx.inexact):
raise TypeError("can't get common type for non-numeric array")
return common_type
i16=np.array(2, dtype=np.int16)
i32=np.array(2, dtype=np.int32)
f32=np.array(2, dtype=np.float32)
for c in [common_type_original,common_type_pr24467, common_type_pr, common_type_with_zero]:
print(f'\n# {c}')
print(f'c(i16, i32): {c(i16, i32)}' )
print(f'c(i16, f32): {c(i16, f32)}' )
print(f'c(f32, f32): {c(f32, f32)}' )
I get
# <function common_type_original at 0x000001EE32CE5F80>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float32'>
# <function common_type_pr24467 at 0x000001EE32CE5800>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float32'>
# <function common_type_pr at 0x000001EE32CE51C0>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float32'>
c(f32, f32): <class 'numpy.float32'>
# <function common_type_with_zero at 0x000001EE32CE5DA0>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float64'>
Performance for main, PR24467, this PR and the variation with zero:
1.24 µs ± 79.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
740 ns ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
333 ns ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
998 ns ± 20 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
At this point I have a slight preference for PR 24467, but it is a bit comparing apples and oranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, those results look like you ran it with result_type(0, *arrays)
and not result_type(0., *[a.dtype for a in arrays])
. But yes, you still get the change for integers, which is maybe for the better, but is a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really was with result_type(0., *[a.dtype for a in arrays])
. For result_type(0, *[a.dtype for a in arrays])
the case common_type(np.array([0]), np.array([0]))
results in a TypeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eendebakpt not sure it matters due to the other problems, but you wrote 0
rather than 0.
there, which is very different. I just tried just to cover my ground triply: common_type_with_zero
will give f32 for f32, f32
. (The problem is i16, f32
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eendebakpt I am still confused at the list np.result_type(0.0, np.int16, np.float32)
gives float32 which is a change in behavior, no (or maybe I am mixing up which version is which)?
I am almost tempted to live with that change of behavior since we have a lot of other promotion related changes, but I have to think about it a bit.
(I suppose my ask is just to confirm that it is a change in behavior. In a sense, the old behavior is more like result_type(*(result_type(0.0, a.dtype) for a in arrays))
...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
np.result_type(0.0, np.int16, np.float32)
givesfloat32
for promotion statelegacy
andfloat64
for promotion stateweak
. But that change is unrelated to this or any of the other PRs (as the PRs do not change theresult_type
itself) -
For
common_type
the output ofcommon_type(i16, f32)
isfloat64
on main (for both weak and legacy promotion rules). For this PR the output ofcommon_type(i16, f32)
isfloat32
(a behavior change). For MAINT: Improve performance of common_type and use result_type internally #24668 the output isfloat64
(so no behaviour change
Summary:
- MAINT: Optimize performance of np.common_type #24467: Pure optimization, no behaviour changes at all
- Draft: MAINT: Improve performance of common_type and use result_type internally #24656: Fastest option, uses
result_type
internally, behaviour change fori16, f32
, does broaden the signature - MAINT: Improve performance of common_type and use result_type internally #24668: Uses
result_type
internally, which simplifies code. Only behaviour change observed is for datetime64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.result_type(0.0, np.int16, np.float32) gives float32 for promotion state legacy and float64 for promotion state weak.
what the heck! That looks wrong, that result shouldn't change with weak promotion! Some code path is mistriggring and thinking this cannot have any weak promotion at all or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg Not sure whether this helps, but the output depends on how the i16
and f32
are passed (as numpy scalar, numpy array or dtype):
import numpy as np
print(np, np.__version__)
i16scalar = np.int16(2)
f32scalar = np.float32(3.)
i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])
for s in ['weak', 'legacy']:
print(s)
np._set_promotion_state(s)
print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))
Output
import numpy as np
print(np, np.__version__)
i16scalar = np.int16(2)
f32scalar = np.float32(3.)
i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])
for s in ['weak', 'legacy']:
print(s)
np._set_promotion_state(s)
print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg I am going to close this PR in favor of #24668 (which includes your two suggestions). The performance of this PR is a better, but maybe not worth the downsides. The only usage of np.common_type
is in np.polynomial.polyutils
and there we could just inline np.result_type
if needed.
I wrote earlier np.result_type(0.0, np.int16, np.float32) gives float32 for promotion state legacy and float64 for promotion state weak
, but actually it is the other way around. For posterity: on current main these are the promotion rules
import numpy as np
print(np, np.__version__)
i16scalar = np.int16(2)
f32scalar = np.float32(3.)
i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])
for s in ['weak', 'legacy']:
print(f'\n\n### promotion {s}')
np._set_promotion_state(s)
print('np.result_type(i16scalar, f32scalar)', np.result_type(i16scalar, f32scalar))
print('np.result_type(i16array, f32array)', np.result_type( i16array, f32array))
print('np.result_type(i16.dtype, f32.dtype)', np.result_type( i16scalar.dtype, f32scalar.dtype))
print('np.result_type(f32)', np.result_type(f32scalar))
print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))
print('np.result_type(0.0, np.int32, np.float32)', np.result_type(0.0, np.int32, np.float32))
print()
print('np.result_type(0., i16scalar, f32scalar)', np.result_type(0., i16scalar, f32scalar))
print('np.result_type(0., i16scalar.dtype, f32scalar.dtype)', np.result_type( 0., i16scalar.dtype, f32scalar.dtype))
print('np.result_type(0.,i16array, f32array)', np.result_type(0., i16array, f32array))
print('np.result_type(0.,i16array.dtype, f32array.dtype)', np.result_type( 0., i16array.dtype, f32array.dtype))
Results in:
### promotion weak
np.result_type(i16scalar, f32scalar) float32
np.result_type(i16array, f32array) float32
np.result_type(i16.dtype, f32.dtype) float32
np.result_type(f32) float32
np.result_type(0.0, i16scalar, f32scalar) float32
np.result_type(0.0, i16array, f32array) float32
np.result_type(0.0, i16.dtype, f32.dtype) float32
np.result_type(0.0, f32) float32
np.result_type(0.0, np.int32, np.float32) float64
np.result_type(0., i16scalar, f32scalar) float32
np.result_type(0., i16scalar.dtype, f32scalar.dtype) float32
np.result_type(0.,i16array, f32array) float32
np.result_type(0.,i16array.dtype, f32array.dtype) float32
### promotion legacy
np.result_type(i16scalar, f32scalar) float32
np.result_type(i16array, f32array) float32
np.result_type(i16.dtype, f32.dtype) float32
np.result_type(f32) float32
np.result_type(0.0, i16scalar, f32scalar) float64
np.result_type(0.0, i16array, f32array) float32
np.result_type(0.0, i16.dtype, f32.dtype) float32
np.result_type(0.0, f32) float64
np.result_type(0.0, np.int32, np.float32) float64
np.result_type(0., i16scalar, f32scalar) float64
np.result_type(0., i16scalar.dtype, f32scalar.dtype) float32
np.result_type(0.,i16array, f32array) float32
np.result_type(0.,i16array.dtype, f32array.dtype) float32
Closing in favor of ##24668 |
Refactor
common_type
to useresult_type
internally. The performance is better than the alternative PR #24467.Results (not very stable, measured with Windows):
with script
The refactoring of
np.common_type
is not fully backwards compatible. In particular the polynomial classes break with coefficients of typenp.timedelta64
(but that might be acceptable).