Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the application of PiecewiseAffineTransform #6963

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

svepe
Copy link

@svepe svepe commented May 26, 2023

The speedup is achieved by replacing manual iteration over individual affine transforms with np.einsum.

Closes #6864

Replace manual iteration over individual affine transforms with np.einsum
@svepe svepe force-pushed the piecewise-affine-speedup branch from 7869e93 to 4b55de1 Compare May 26, 2023 19:09
@lagru lagru self-requested a review May 26, 2023 19:39
@23pointsNorth
Copy link
Contributor

Any updates on this? @lagru @stefanv

@lagru
Copy link
Member

lagru commented Dec 29, 2023

Hey, sorry for our delayed response on this. I'll have a look.

@lagru lagru self-assigned this Dec 29, 2023
@lagru
Copy link
Member

lagru commented Dec 29, 2023

I am noticing that for the old and new implementation the number seems to change a bit. E.g. for the example below the residuals change on a scale <1e-15.

import skimage as ski
src = np.array([[0, 3], [0, -1], [1, 2], [-1, 0], [1, 1], [-1, 1]])
dst = np.array([[0, 0], [1, 0], [1, 1], [1, 2], [0, 2], [0, 1]])
tform = ski.transform.PiecewiseAffineTransform()
tform.estimate(src, dst)
tform.residuals(src, dst)

Probably floating point errors but not sure yet? I don't like that our tests don't catch this.

While we are at it, would you mind adapting the above example with outputs as a doctest? :)

@lagru
Copy link
Member

lagru commented Dec 29, 2023

With

import numpy as np
import skimage as ski
rng = np.random.default_rng(42)
src = rng.random((100, 2))
dst = rng.random((100, 2))
tform = ski.transform.PiecewiseAffineTransform()
tform.estimate(src, dst)
x = rng.random((2000, 2))

%timeit tform(x)  # requires ipython

the propsed solution seems to be ~7 times faster. 👍

@svepe
Copy link
Author

svepe commented Jan 4, 2024

I've added the doctest to the estimate function.

@lagru
Copy link
Member

lagru commented Jan 5, 2024

Thanks @svepe! I tried to address the test failure in 949cf8a.

@svepe
Copy link
Author

svepe commented Jan 10, 2024

@lagru Let me know if there is anything else needed on my end to merge the PR.

@lagru
Copy link
Member

lagru commented Jan 11, 2024

@svepe, I think this looks good. However, I am hesitant to approve this because this performance optimization does change our results, if only slightly. We have a policy to to never silently change the results if input stays the same. And because we don't seem to have a good test coverage for this, I am hesitant to assume from a single doctest that this is only noise due to floating errors.

Let's see what @scikit-image/core thinks and I'll bring it up in the next meeting as well.

@lagru
Copy link
Member

lagru commented Jan 11, 2024

Point in favor of merging this is that the snippet below seems to pass always. We use assert_allclose elsewhere to check our output with a relative tolerance of 1e-7. So we might consider this change just noise.

Details

import numpy as np
import skimage as ski

class NewPiecewiseAffineTransform(ski.transform.PiecewiseAffineTransform):
    def __call__(self, coords):
        coords = np.asarray(coords)
        # determine triangle index for each coordinate
        simplex = self._tesselation.find_simplex(coords)
        # stack of affine transforms to be applied to every coord
        affines = np.stack([affine.params for affine in self.affines])[simplex]
        # convert coords to homgeneous points
        points = np.c_[coords, np.ones((coords.shape[0], 1))]
        # apply affine transform to every point
        result = np.einsum("ikj,ij->ik", affines, points)
        # coordinates outside of mesh
        result[simplex == -1, :] = -1
        # convert back to 2d coords
        result = result[:, :2]
        return result

rng = np.random.default_rng(42)
for _ in range(100):
    src = rng.random((100, 2))
    dst = rng.random((100, 2))
    tform = ski.transform.PiecewiseAffineTransform()
    tform.estimate(src, dst)
    tform2 = NewPiecewiseAffineTransform()
    tform2.estimate(src, dst)
    x = rng.random((2000, 2))
    np.testing.assert_allclose(tform(x), tform2(x))

Additional disclaimer before approving this: I still need to wrap my head around the einsum magic happening here. 😅

@svepe
Copy link
Author

svepe commented Jan 11, 2024

The code before used a python loop iterating over simplices and at every iteration would i) get the simplex affine matrix, ii) get the points on the simplex and iii) apply the affine transform. The slowest bit is the loop, so the key idea is to match every point with its corresponding affine matrix and let numpy do the multiplication for all points.

So the proposed code does 2 things:

  1. Construct a stack of affine matrices (affines in the code) such that affines[i] contains the transformation matrix for the i-th point (points[i]). This allows us to apply the entire PiecewiseTransform with something like
for i in range(len(points)):
    result[i] = affines[i] * points[i]

but that still has the python loop.

  1. So to do the multiplication in numpy we can use einsum. Multiplying a single matrix T with a single vector p with einsum can be done with q = np.einsum("kj, j -> k". T, p) which in math notation is equivalent to

image
But we want to do multiple matrix-point multiplications (affines[i] * points[i]) hence we do q = np.einsum("ikj, ij -> ik", T, p) which is equivalent to

image
where now p is a 2d matrix (points) and T is a 3d matrix (affines). Hopefully, this demystifies it a bit. Probably can be done with numpy.dot and a couple of extra dimensions and transpositions.

@stefanv
Copy link
Member

stefanv commented Jan 11, 2024

This optimization seems fine. I would recommend that we keep a version of the (original) code in a comment that explains what's happening under the hood. A text comment may also help with that. The slight change in residual is not a concern, being so small.

@23pointsNorth
Copy link
Contributor

@stefanv Do you mean appending

        # The below is equivalent to the following code, speed up by replacing manual iteration with einsum
        # for index in range(len(self._tesselation.simplices)):
            # Affine transform for triangle
            # affine = self.affines[index]
            # All coordinates within triangle
            # index_mask = simplex == index
            # out[index_mask, :] = affine(coords[index_mask, :])

Before the main/einsum call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10x speed up for transform.PiecewiseAffineTransform
4 participants