Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data → color pipeline #28658

Merged
merged 18 commits into from
Oct 24, 2024
Merged

Conversation

trygvrad
Copy link
Contributor

@trygvrad trygvrad commented Aug 3, 2024

The Colorizer class, ColorizerShim, and ColorableArtist which replaces the old ScalarMappable.

PR summary

This PR implements the changes in #28428 where a container class is used to hold the $data\rightarrow color$ pipeline.
Schematically, this pipeline looks like this:
image
I have changed the names to be:

Mapper → Colorizer
ScalarMappableShim → ColorizerShim
ColorableArtist → ColorizingArtist

compared to the existing pipeline:
image

This is a minimal implementation so that the interested parties can take a look.
It does not include typing or new tests, but all existing tests pass


Note 1:

changed() on Colorizer and ColorableArtist is executes:

        self.callbacks.process('changed')

This is consistent with how the callbacks are handled on the norm
but different from how the callbacks are handled on ScalarMappable

        self.callbacks.process('changed', self)

I believe this it is more consistent if the new classes do self.callbacks.process('changed') and I have not found any example where the additional argument matters.


Note 2:

I would like to move Colorizer and ColorizerShim to a new file, instead of cm.py.
If this is done, cm.py would only contain ColormapRegistry (and ScalarMappable for compatibility).


Note 3:

Here are some points that could be addressed in following PRs:

  1. Colorbar does not need a reference to the artist, only the Colorizer. A small PR could change this, but there should probably be an example on how you can change the .colorizer on both an artist and a colorbar.
  2. With this PR, the following example can be greatly simplified, and should be updated: https://matplotlib.org/stable/gallery/images_contours_and_fields/multi_image.html
  3. The ColorableArtist class has a cbook.CallbackRegistry(), but I believe this to be superfluous as all callbacks should in reality come from the Colorizer.
  4. The bulk of Artist.format_cursor_data should be moved to ColorizableArtist.
  5. Fix the bug in updating hexbin [Bug]: Hexbin hbar and vbar not updating correctly on changed #28633
  6. There are still some scattered uses of cm.ScalarMappable, where artists need additional ones. These should be changed to use the new Colorizer.
  7. From @timhoffm : We may have to reconsider how auto-scaling is executed. Currently, the first array to be set triggers auto-scaling. This does not work well when multiple ColorizingArtists share a Colorizer and thus a Norm. It may be reasonable to defer scaling to the first time that the norm (or the colorizing pipeline) is executed and collect the data limits from all participating Artists at that time. However, this requires a connection from the Colorizer to all associated artists. - I suppose this should be handled on the Colorizer level, The norm itself should likely stay Artist-agnostic.

Note 4:

The following code demonstrates how multiple axes can be connected to one Colorizer object:

import matplotlib
matplotlib.use('qtagg')
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
data = np.linspace(0.1, 0.6, 6)[:, np.newaxis, np.newaxis] * np.random.rand(6, 10, 20)

colrz = matplotlib.cm.Colorizer()
colrz.vmin = np.min(data)
colrz.vmax = np.max(data)

fig, axs = plt.subplots(2, 3)
fig.suptitle('Multiple images')
for i, ax in enumerate(axs.ravel()):
    im = ax.imshow(data[i], norm = colrz)
    ax.label_outer()

cb = fig.colorbar(im, ax=axs, orientation='horizontal', fraction=.1)

plt.show()

(I didn't re-record the video, it was orignally posted here and in order for changes made on the colorbar to propagate to the artist the Colorbar class needs some small changes [see point 1. in the list above], but changes on one of the images will propagate to all images and the colorbar as shown in the video.)
https://github.com/user-attachments/assets/093d5fa1-0dfc-4a68-9197-24acd0ef4aed

PR checklist

¹ we should consider the existing tests for cm.ScalarMappable and see if any of them should be replicated to test Colorizer and ColorableArtist.

@timhoffm
Copy link
Member

timhoffm commented Aug 5, 2024

Naming: Not sure ScalarMappableShim → ColorizerShim is meaningful. I would say, this a Shim for the ScalarMappable API and hence, ScalarMappable in the name makes sense.

Note 2: Yes, please move to a new file.

Note 3: All agreed. Note that I've already simplified https://matplotlib.org/devdocs/gallery/images_contours_and_fields/multi_image.html through #28545. This still needs adaption to the Colorizer API, but only after we've untangled the Artist backreferences so that multiple Artists can share the same Colorizer/Colorbar

lib/matplotlib/artist.py Outdated Show resolved Hide resolved
@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 6, 2024

@timhoffm

Naming: Not sure ScalarMappableShim → ColorizerShim is meaningful. I would say, this a Shim for the ScalarMappable API and hence, ScalarMappable in the name makes sense.

My thoughts regarding the name have evolved as I have been working on the issue, which has led me to the following points:

  1. The name ScalarMappableShim makes sense in the context that we are replacing ScalarMappable, but in a few years time, with ScalarMappable possibly depreciated and new developers coming in, I don't think the name ScalarMappableShim will make much sense any more.
  2. The shim allows access to attributes on a Colorizer objects without referencing the object itself¹, and this functionality is useful for the end user, i.e. while we could depreciate ScalarMappable in the foreseeable future, I don't really see a future where we depreciate the shim.
  3. The shim will support both scalar and multivariate data, and for that reason I would like to avoid 'Scalar' in the name.

That said, I don't think the name is of much significance, so just tell me again and I'll change it back to ScalarMappableShim 🙂

¹ In my mind ColorizerShim well describes this use, but I may be misusing 'shim' as a technical term, as I am not familiar with the terminology.

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 6, 2024

@timhoffm
I moved Colorizer to a new file (this also makes the git diff a lot easier to read)

Also:
This still needs updated typing and tests, but I would like to holding off on spending too much time on this until I get a bit more feedback @tacaswell @ksunden @story645.

Comment on lines 1350 to 1351
if np.ndim(data) == 0 and (isinstance(self, ScalarMappable) or
isinstance(self, ColorizingArtist)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if np.ndim(data) == 0 and (isinstance(self, ScalarMappable) or
isinstance(self, ColorizingArtist)):
if np.ndim(data) == 0 and isinstance(self, (ScalarMappable, ColorizingArtist)):

interval ``[0, 1]``.
If a `str`, a `colors.Normalize` subclass is dynamically generated based
on the scale with the corresponding name.
If `cm.Colorizer`, the norm an colormap on the `cm.Colorizer` will be used
Copy link
Member

@timhoffm timhoffm Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not overload the signature. A typical strategy would be to have the canoncial signature and create a factory classmethod for the other signature:

def __init__(colorizer):
    self.colorizer = colorizer

@classmethod
def from_norm_and_cmap(cls, norm, cmap):
    return cls(Colorizer(norm, cmap)

AFAICT, the situation here is a bit different, because users do not instantiate ColorizingArtist directly, and hence, they won't use the factory method. We could do one of the two options:

  • only implement def __init__(colorizer). If subclasses need to initialize from norm + cmap, they'd simply do
    ColorizingArtist.__init__(Colorizer(norm, cmap))
    
  • create an alternative initializer
    def _init_from_norm_and_cmap(norm, cmap):
        self.colorizer = Colorizer(norm, cmap)
    

Given that we'll only have a handfull of subclasses and the solution is quite simple, I'm leaning towards the first option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first alternative makes sense to me.

However, we still need the logic where the user can pass a Colorizer object as part of the input to the plotting function.

I implemented this with the following function:

def _get_colorizer(cmap, norm):
    """
    Passes or creates a Colorizer object.

    Allows users to pass a Colorizer as the norm keyword
    where a artist.ColorizingArtist is used as the artist.
    If a Colorizer object is not passed, a Colorizer is created.
    """
    if isinstance(norm, Colorizer):
        if cmap:
            raise ValueError("Providing a `cm.Colorizer` as the norm while "
                             "at the same time providing a `cmap` is not supported.")
        return norm
    return Colorizer(cmap, norm)

Such that the initiation of a ColorizingArtist in e.g. Collection is:

        artist.ColorizingArtist.__init__(self, colorizer._get_colorizer(cmap, norm))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to pass a Colorizer via the norm parameter in public API? This feels very hacky. I'm inclined to add a colorizer parameter instead, even if that means we have mutually exclusive parameters. (Note that even with the hack, we'd have to live with partially mutually exclusive parameters, because you cannot pass norm=mycolorizer, cmap='jet').

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinions, and am inclined to agree with you @timhoffm.
Since this is a change to the call signature of the plotting functions, we should bring it up at a weekly meeting. @story645 could you add it on the agenda for the next meeting?

(My aversion to new keyword arguments is a result of a discussion at a weekly meeting a long time ago about multivariate color mapping, and how we do not want that to increase the number of keyword arguments, but this is a different case entirely.)

Copy link
Member

@story645 story645 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't see this til now 😦 but also anyone can add stuff to the agenda (sorry me and Kyle didn't make that clear)! And also this got resolved as yes add new kwarg, I think.

@story645
Copy link
Member

story645 commented Aug 6, 2024

The shim will support both scalar and multivariate data, and for that reason I would like to avoid 'Scalar' in the name.

ColorizerMappable?

or ColorizerMappableInterface/colorizer.MappableInterface since this is aimed at providing an interface akin to the old Mappable Interface (I think?)

lib/matplotlib/artist.py Outdated Show resolved Hide resolved
lib/matplotlib/cm.py Outdated Show resolved Hide resolved
@@ -32,7 +32,7 @@
"linewidth": ["linewidths", "lw"],
"offset_transform": ["transOffset"],
})
class Collection(artist.Artist, cm.ScalarMappable):
class Collection(artist.ColorizingArtist, colorizer.ColorizerShim):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the colorizing artist gets built out/in later PRs, is the plan that the inherit from Shim goes away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The shim provides the backward-compatible interface for the ScalarMappaple API. It‘s needed for the foreseeable future, but it’s reasonable to keep it separated from the ColorizingArtist API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So where I'm confused is that I understand the compatibility layer as public API for downstream libraries, I'm confused about compatibility layer for private API where we can use the thing directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean down the line, I understand for this and some foreseeable PRs the shim being here to show that the functionality is being maintained w/o having to reimplement everything.

lib/matplotlib/colorizer.py Outdated Show resolved Hide resolved
Comment on lines +74 to +103
try:
scale_cls = scale._scale_mapping[norm]
except KeyError:
raise ValueError(
"Invalid norm str name; the following values are "
f"supported: {', '.join(scale._scale_mapping)}"
) from None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code mirrors the existing code for ScalarMappable (cm.py lines 545→552), but I'll see if I can shift it to use the _api or if there is some reason why that has not been implemented for ScalarMappable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points:

  1. This will change the error message from:
" Invalid norm str name; the following values are supported: 'linear', 'log', 'symlog', 'asinh', 'logit', 'function', 'functionlog'

to

' ' is not a valid value for norm; supported values are 'linear', 'log', 'symlog', 'asinh', 'logit', 'function', 'functionlog'

Which is slightly misleading, as norm supports other inputs than strings. We can mitigate this by intercepting the error message and replacing value with string, but this requires an additional try except block.

  1. I think this check naturally belongs in _auto_norm_from_scale, a function which is currently called by:
        elif isinstance(norm, str):
            try:
                scale_cls = scale._scale_mapping[norm]
            except KeyError:
                raise ValueError(
                    "Invalid norm str name; the following values are "
                    f"supported: {', '.join(scale._scale_mapping)}"
                ) from None
            norm = _auto_norm_from_scale(scale_cls)()

which I think should be replaced by:

        elif isinstance(norm, str):
            norm = _auto_norm_from_str(norm)()

(note the change in function name from _scale to _str)
This function then incorporates the check.

def _auto_norm_from_str(norm_str):
    scale_cls = _api.check_getitem(scale._scale_mapping, norm=norm_str)
    ...

@timhoffm @story645 Let me know your opinions on point 1 and 2 above, and I will implement it in this PR.
Point 2. would be a useful change with regards to making a 'MultiNorm' in a later PR.

Copy link
Member

@timhoffm timhoffm Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the message would be slightly misleading. OTOH pushing this down does not make much better for the user. Thy still get the same exception message. Only the call stack is different. While the message in the deeper context is correct, it's further away from the public API and thus more difficult to trace back.

An alternative would be expanding _check_get_item so that we can do _api.check_getitem(scale._scale_mapping, norm=norm_str, _type="str") and get "' ' is not a valid str value ...", which would imply there may be other valid types.

I believe in practice, it does not matter. If the user has passed an invalid str, they most likely have a typo and still wanted to pass a str name. It's unlikely that the information being able to pass Norms directly would be helpful. To keep things simple, I slighly favor just using _check_get_item as is in original context (this does not factor in any possible future needs). But overall no strong opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 2. would be a useful change with regards to making a 'MultiNorm' in a later PR.

Would it be easy to make this change inside the MultiNorm PR or better to have it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have it here, with the changes in #28690 that means we only have to move this function once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would changing the get item check method in the way Tim described achieve the same ends? If so do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, unfortunately I think moving it twice but making the change when folks can see why it helps is gonna be an easier sell.

lib/matplotlib/colorizer.py Outdated Show resolved Hide resolved
@trygvrad trygvrad force-pushed the Colorizer-class branch 2 times, most recently from c5393d5 to 2d4ae15 Compare August 8, 2024 09:27
"converted to float")

self._A = A
if not self.norm.scaled():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that this will always have a .norm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it will always have a norm, as I think that simplifies the implementation.
However, that norm might be NoNorm.

The implementation here mirrors that of ScalarMappable, which always has a Normalize and Colormap object, and I think this is sensible. For images that are already RGB(A) a norm (and cmap) is technically unnecessary, but I believe the overhead of creating them is negligible.

@tacaswell
Copy link
Member

The shim provides the backward-compatible interface for the ScalarMappaple API. It‘s needed for the foreseeable future, but it’s reasonable to keep it separated from the ColorizingArtist API.

The norm and cmap properties only ever make sense with the ColorizerArist as the mix-in assumes that something else is providing the .colorizer attribute. I'm very unconvinced that splitting it off into its own class does more than make our multiple inheritance look worse and add some extra ceremony. If we do want to have a version of CololizerArtist without the back-compat API can we call it ColorizerCore or something like that and have a sub-class (that we use everywhere) that include the properties?

@timhoffm
Copy link
Member

The shim provides the backward-compatible interface for the ScalarMappaple API. It‘s needed for the foreseeable future, but it’s reasonable to keep it separated from the ColorizingArtist API.

The norm and cmap properties only ever make sense with the ColorizerArist as the mix-in assumes that something else is providing the .colorizer attribute. I'm very unconvinced that splitting it off into its own class does more than make our multiple inheritance look worse and add some extra ceremony. If we do want to have a version of CololizerArtist without the back-compat API can we call it ColorizerCore or something like that and have a sub-class (that we use everywhere) that include the properties?

My motivation was to have a separation between "API we have to provide to ensure backward compatibility" and "API we want to provide". At least during the design phase this could make it more clear. We also need to decide at some point whether there is API that we want to discourage. This is all possible also with a single large class, but I have the impression that that is more mental load and we have to be extra careful to not miss anything. But no strong opinion. If you think you can keep the overview, that's ok.

@trygvrad
Copy link
Contributor Author

I'm going to make _ColorizingInterface as a base class which contains the functions that are now part of the shim.
Then I will make ColorizingArtist inherit from _ColorizingInterface and Artist
This has two advantages:

  1. It separates the functions that do things on the ColorizingArtist from the functions that only delegate to the Colorizer
  2. We can let ScalarMappable inherit from _ColorizingInterface so that we don't need to keep two versions of these functions around (for as long as ScalarMappable is around).

Once/if ScalarMappable is deprecated we can evaluate if it makes sense to remove _ColorizingInterface as a separate class.

lib/matplotlib/artist.py Outdated Show resolved Hide resolved
@@ -4696,7 +4697,7 @@ def invalid_shape_exception(csize, xsize):
label_namer="y")
@_docstring.interpd
def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None, *,
vmin=None, vmax=None, colorizer=None, alpha=None, linewidths=None, *,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to go after the * or we break API (even though passing 10 positional arguments is a Bad Idea ™️ if someone was doing that we would now get the value of alpha as colorizer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-remark: We should consider tightening on the kw-only settings. It helps the understanding to collect related parameters next to each other. I'd be willing to force people with Bad Ideas™️ to adjust in this case.

But this doesn't help for now because we'd need to go through the deprecation cycle. So @tacaswell's comment is valid anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback, I was unsure if it was OK to break the list of args for people with Bad Ideas™️, but I guess we should cater to everyone :)

@QuLogic
Copy link
Member

QuLogic commented Oct 22, 2024

The doc failure appears relevant, unfortunately.

@tacaswell tacaswell merged commit 51e8f7f into matplotlib:main Oct 24, 2024
41 of 42 checks passed
@tacaswell
Copy link
Member

Thank you @trygvrad ! Congrats on driving a pretty big change to Matplotlib

@story645
Copy link
Member

story645 commented Nov 1, 2024

Just noting here that this needs a what's new entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

6 participants