skimage.exposure.rescale_intensity uses far more memory than necessary #7199
Open
Description
Description:
rescale_intensity()
uses temporary arrays, because doing fast vectorized operations with NumPy requires temporary arrays. This means much more memory usage than necessary. This can be avoided with Cython or Numba so the operation usesfor
loops, but I understand why that may not be appealing (Numba is pretty easy to use though!).- Even sticking to temporary arrays,
rescale_intensity()
creates too many temporary arrays concurrently: once the initial one is created, in-place operations should suffice given it's a temporary.
Is this a problem? Not necessarily. But if someone passes in a large uint8 array, and now you're creating multiple temporary float64 arrays, then peak memory usage is much much higher than a user might expect.
Other notes
- A naive Numba version runs much faster, albeit with the one time cost of precompilation. An optimized Numba version can run 4× faster than the naive Numba version.
- Users can do rescaling in chunks, as a workaround. However, memory usage tends to be opaque, it's harder to notice than "oh this function is slow", and so it's quite likely people won't notice that this an issue so they won't know to do a workaround.
- I suspect this is an issue in other skimage APIs as well.
Way to reproduce:
Note that getrusage may have different units on macOS, and won't work on Windows.
from resource import getrusage, RUSAGE_SELF
from skimage.exposure import rescale_intensity
import numpy as np
print("Baseline max resident memory in MB (Linux)", getrusage(RUSAGE_SELF).ru_maxrss / 1024)
image = np.ones((10_000, 10_000), dtype=np.uint8)
print("Initial image max rss in MB (Linux)", getrusage(RUSAGE_SELF).ru_maxrss / 1024)
rescaled = rescale_intensity(image)
print("Rescaling max rss in MB (Linux)", getrusage(RUSAGE_SELF).ru_maxrss / 1024)
When I run this I get:
Baseline max resident memory in MB (Linux) 36.78515625
Initial image max rss in MB (Linux) 132.15625
Rescaling max rss in MB (Linux) 1753.453125
Consider the two scenarios above:
- If you switched to a for loop using Cython/Numba, max resident memory would be ~240MB when rescaling: the input + output array and that's it.
- If you stuck to temporary arrays, but did it more efficiently, max resident memory could be as low as 1040MB: the input array + a single float64 temporary arrays + a single output array, for a total of 10× the initial array's memory usage.
In practice, memory usage is neither 1040MB nor 240MB, it's 1750MB, suggesting there's two concurrent floating point temporary arrays.
Version information:
>>> import sys; print(sys.version)
3.11.5 (main, Aug 25 2023, 13:19:50) [GCC 11.4.0]
>>> import platform; print(platform.platform())
Linux-6.5.4-76060504-generic-x86_64-with-glibc2.35
>>> import skimage; print(f'scikit-image version: {skimage.__version__}')
scikit-image version: 0.21.0
>>> import numpy; print(f'numpy version: {numpy.__version__}')
numpy version: 1.25.2
0.22 has the same issue, though.