Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: MAINT: Improve performance of common_type and use result_type internally #24656

Closed
wants to merge 6 commits into from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Sep 6, 2023

Refactor common_type to use result_type internally. The performance is better than the alternative PR #24467.

Results (not very stable, measured with Windows):

main: 1.8 µs ± 81.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
#24467: 1.13 µs ± 35.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
PR: 799 ns ± 55.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

with script

import numpy as np
x=np.array([1.])
%timeit np.common_type(x, x)

The refactoring of np.common_type is not fully backwards compatible. In particular the polynomial classes break with coefficients of type np.timedelta64 (but that might be acceptable).

return array_type[1][precision]
else:
return array_type[0][precision]
common_type = _multiarray_umath.result_type(*arrays).type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this will make it a bit slower possibly, but I think I would like to:

  1. Use *(a.dtype for a in arrays) because unfortunately result_type supports both array and dtype inputs which would broad the signature here.
  2. We can add 0.0 as an additional argumen tto result_type() to enforce the integer -> int64 transition.

Unfortunately, this will improve the situation when mixing integers and floats: float32 + int16 will actually promote to stay float32. While here any integer forces float64 precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is to make sure people do not pass a dtype directly to common_type? It does make the code a bit slower, but the overhead result is still ok.
  2. We could use common_type = _multiarray_umath.result_type(0., *arrays).type and skip the check on integer type. The pure int goes to float64 (as in current main), but a disadvantage is that now float32+float32 goes to float64 (where before it would be float32)

With

def common_type_with_zero(*arrays):
    common_type = _multiarray_umath.result_type(0., *[a.dtype for a in arrays]).type
    if not issubclass(common_type, _nx.inexact):
        raise TypeError("can't get common type for non-numeric array")
    return common_type

i16=np.array(2, dtype=np.int16)
i32=np.array(2, dtype=np.int32)
f32=np.array(2, dtype=np.float32)

for c in [common_type_original,common_type_pr24467, common_type_pr, common_type_with_zero]:
    print(f'\n# {c}')
    print(f'c(i16, i32): {c(i16, i32)}' )
    print(f'c(i16, f32): {c(i16, f32)}' )
    print(f'c(f32, f32): {c(f32, f32)}' )

I get

# <function common_type_original at 0x000001EE32CE5F80>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float32'>

# <function common_type_pr24467 at 0x000001EE32CE5800>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float32'>

# <function common_type_pr at 0x000001EE32CE51C0>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float32'>
c(f32, f32): <class 'numpy.float32'>

# <function common_type_with_zero at 0x000001EE32CE5DA0>
c(i16, i32): <class 'numpy.float64'>
c(i16, f32): <class 'numpy.float64'>
c(f32, f32): <class 'numpy.float64'>

Performance for main, PR24467, this PR and the variation with zero:

1.24 µs ± 79.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
740 ns ± 55.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
333 ns ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
998 ns ± 20 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

At this point I have a slight preference for PR 24467, but it is a bit comparing apples and oranges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, those results look like you ran it with result_type(0, *arrays) and not result_type(0., *[a.dtype for a in arrays]). But yes, you still get the change for integers, which is maybe for the better, but is a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really was with result_type(0., *[a.dtype for a in arrays]). For result_type(0, *[a.dtype for a in arrays]) the case common_type(np.array([0]), np.array([0])) results in a TypeError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eendebakpt not sure it matters due to the other problems, but you wrote 0 rather than 0. there, which is very different. I just tried just to cover my ground triply: common_type_with_zero will give f32 for f32, f32. (The problem is i16, f32)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eendebakpt I am still confused at the list np.result_type(0.0, np.int16, np.float32) gives float32 which is a change in behavior, no (or maybe I am mixing up which version is which)?

I am almost tempted to live with that change of behavior since we have a lot of other promotion related changes, but I have to think about it a bit.

(I suppose my ask is just to confirm that it is a change in behavior. In a sense, the old behavior is more like result_type(*(result_type(0.0, a.dtype) for a in arrays))...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg

  • np.result_type(0.0, np.int16, np.float32) gives float32 for promotion state legacy and float64 for promotion state weak. But that change is unrelated to this or any of the other PRs (as the PRs do not change the result_type itself)

  • For common_type the output of common_type(i16, f32) is float64 on main (for both weak and legacy promotion rules). For this PR the output of common_type(i16, f32) is float32 (a behavior change). For MAINT: Improve performance of common_type and use result_type internally #24668 the output is float64 (so no behaviour change

Summary:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.result_type(0.0, np.int16, np.float32) gives float32 for promotion state legacy and float64 for promotion state weak.

what the heck! That looks wrong, that result shouldn't change with weak promotion! Some code path is mistriggring and thinking this cannot have any weak promotion at all or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg Not sure whether this helps, but the output depends on how the i16 and f32 are passed (as numpy scalar, numpy array or dtype):

import numpy as np
print(np, np.__version__)

i16scalar = np.int16(2)
f32scalar = np.float32(3.)

i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])

for s in ['weak', 'legacy']:
    print(s)
    np._set_promotion_state(s)
    print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
    print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
    print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
    print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))

Output

import numpy as np
print(np, np.__version__)

i16scalar = np.int16(2)
f32scalar = np.float32(3.)

i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])

for s in ['weak', 'legacy']:
    print(s)
    np._set_promotion_state(s)
    print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
    print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
    print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
    print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg I am going to close this PR in favor of #24668 (which includes your two suggestions). The performance of this PR is a better, but maybe not worth the downsides. The only usage of np.common_type is in np.polynomial.polyutils and there we could just inline np.result_type if needed.

I wrote earlier np.result_type(0.0, np.int16, np.float32) gives float32 for promotion state legacy and float64 for promotion state weak, but actually it is the other way around. For posterity: on current main these are the promotion rules

import numpy as np
print(np, np.__version__)

i16scalar = np.int16(2)
f32scalar = np.float32(3.)

i16array = np.int16([2, 2])
f32array = np.float32([3., 3.])

for s in ['weak', 'legacy']:
    print(f'\n\n### promotion {s}')
    np._set_promotion_state(s)
    print('np.result_type(i16scalar, f32scalar)', np.result_type(i16scalar, f32scalar))
    print('np.result_type(i16array, f32array)', np.result_type( i16array, f32array))
    print('np.result_type(i16.dtype, f32.dtype)', np.result_type( i16scalar.dtype, f32scalar.dtype))
    print('np.result_type(f32)', np.result_type(f32scalar))
    
    print('np.result_type(0.0, i16scalar, f32scalar)', np.result_type(0.0, i16scalar, f32scalar))
    print('np.result_type(0.0, i16array, f32array)', np.result_type(0.0, i16array, f32array))
    print('np.result_type(0.0, i16.dtype, f32.dtype)', np.result_type(0.0, i16scalar.dtype, f32scalar.dtype))
    print('np.result_type(0.0, f32)', np.result_type(0.0, f32scalar))
    
    print('np.result_type(0.0, np.int32, np.float32)', np.result_type(0.0, np.int32, np.float32))

    print()
    print('np.result_type(0., i16scalar, f32scalar)', np.result_type(0.,  i16scalar, f32scalar))
    print('np.result_type(0., i16scalar.dtype, f32scalar.dtype)', np.result_type( 0., i16scalar.dtype, f32scalar.dtype))

    print('np.result_type(0.,i16array, f32array)', np.result_type(0.,  i16array, f32array))
    print('np.result_type(0.,i16array.dtype, f32array.dtype)', np.result_type( 0., i16array.dtype, f32array.dtype))

Results in:


### promotion weak
np.result_type(i16scalar, f32scalar) float32
np.result_type(i16array, f32array) float32
np.result_type(i16.dtype, f32.dtype) float32
np.result_type(f32) float32
np.result_type(0.0, i16scalar, f32scalar) float32
np.result_type(0.0, i16array, f32array) float32
np.result_type(0.0, i16.dtype, f32.dtype) float32
np.result_type(0.0, f32) float32
np.result_type(0.0, np.int32, np.float32) float64

np.result_type(0., i16scalar, f32scalar) float32
np.result_type(0., i16scalar.dtype, f32scalar.dtype) float32
np.result_type(0.,i16array, f32array) float32
np.result_type(0.,i16array.dtype, f32array.dtype) float32


### promotion legacy
np.result_type(i16scalar, f32scalar) float32
np.result_type(i16array, f32array) float32
np.result_type(i16.dtype, f32.dtype) float32
np.result_type(f32) float32
np.result_type(0.0, i16scalar, f32scalar) float64
np.result_type(0.0, i16array, f32array) float32
np.result_type(0.0, i16.dtype, f32.dtype) float32
np.result_type(0.0, f32) float64
np.result_type(0.0, np.int32, np.float32) float64

np.result_type(0., i16scalar, f32scalar) float64
np.result_type(0., i16scalar.dtype, f32scalar.dtype) float32
np.result_type(0.,i16array, f32array) float32
np.result_type(0.,i16array.dtype, f32array.dtype) float32

@eendebakpt
Copy link
Contributor Author

Closing in favor of ##24668

@eendebakpt eendebakpt closed this Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants