Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT]: Move norms to from colors.py to separate file norms.py #28690

Open
trygvrad opened this issue Aug 9, 2024 · 18 comments
Open

[MNT]: Move norms to from colors.py to separate file norms.py #28690

trygvrad opened this issue Aug 9, 2024 · 18 comments

Comments

@trygvrad
Copy link
Contributor

trygvrad commented Aug 9, 2024

Summary

Currently the data→norm→color pipeline involves two files:

  1. colors.py:
    1. Utility functions for working with colors
    2. Colormap and subclasses of Colormap
    3. Normalize and subclasses of Normalize
  2. cm.py:
    1. The colormap registry
    2. ScalarMappable (holds data, norm and colormap)

In this structure, each file contains multiple parts of the pipeline, and it is not intuitive which part of the pipeline is located where.

With PR #28658 we are creating colorizer.py and the new Colorizer object to improve the data→norm→color pipeline.
This has the added benefit that it greatly simplifies cm.py, which now only contains the colormap registry (and ScalarMappable for compatibility reasons).

Proposed fix

This issue suggests moving the norms to a separate file in order to clean up colors.py. This, together with #28658 would lead to the following structure:

  1. colors.py:
    1. Utility functions for working with colors
    2. Colormap and subclasses of Colormap
  2. norms.py
    1. Normalize and subclasses of Normalize
  3. cm.py:
    1. The colormap registry
  4. colorizer.py
    1. Colorizer (handles the data→norm→color pipeline)

With this structure, each file only contains a single part of the pipeline.

This issue came up because I was looking to implement multivariate color mapping (see issue #14168, and PRs #28454, #28658, [#28428]) , and for this we will need at least one more norm (i.e. MultiNorm).

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 9, 2024

At the weekly meeting Aug. 8th 2024, this issue was disussed, and there was a general consensus that this is a good idea from a maintenance/development point of view, but there are potential drawbacks with regards to backwards compatibility.

Three options were discussed:

  1. Move the norms to norms.py
    • In colors.py, import the currently existing norms:
      from .norms import Normalize, TwoSlopeNorm, CenteredNorm, FuncNorm, NoNorm, AsinhNorm, PowerNorm, BoundaryNorm, LogNorm, SymLogNorm
    • New norms are only accessible via norms.py
    • A long (or indefinite) depreciation period for the imports of norms in colors.py
  2. Move the norms to a private file _norms.py.
    • In colors.py, import all norms (existing and new)
    • All access to the norms should still go via colors.py
    • This cleans up colors.py for development, but does not change anything for the users (for better or worse)
  3. Leave the norms in colors.py.
    • I.e. no changes.

At the meeting there was a general agreement that the current structure of cm.py and colors.py is confusing, and that moving the norms could help remedy this.
More importantly, there was a consensus that we would like to hear the opinion of the API lead @timhoffm on this.

see the meeting notes https://hackmd.io/@matplotlib/HkY6emxs2

@timhoffm
Copy link
Member

timhoffm commented Aug 9, 2024

I fully agree that the current structure and naming is confusing. I never remember where the different parts are. I think the colors module should only contain basic color tooling (the sections Other classes, Functions and Exported colors from https://matplotlib.org/stable/api/colors_api.html).

All colormapping-related stuff should be separate. Disregarding for a moment where we are now, there are fundamentally three approaches one could take:

  • One large namespace colormapping that includes all relevant components (Norm, Colormap, Colorizer, ColorizingArtist, ColormapRegisty)
  • Separate namespaces norms, colorizer
  • Separate sub-namespaces under a common colormapping namespace.

From a user and API perspective, I find one large colormapping namespace attractive. It signals that these things belong together, one does not need to remember and import 5 different namespaces and its flat. We could try putting all code in there, but I suspect that file would become very large. The alternative would be to make private sub-namespaces matplotlib.colormapping._colorizer etc. and re-import them into matplotlib.colormapping. This can be a slight hassle as the public API is not the same as the fully qualified class name, but we can cope with this and e.g. do this with Axes as well.

How to get there?

  • Create the new structure
  • matplotlib.cm stays but becomes completely legacy. New code should not need to import it.
  • matplotlib.colors will stay the public API for the basic color (non-mapping) topics. It will re-import Colormap and Normalize for backward-compatibility.

@timhoffm
Copy link
Member

timhoffm commented Aug 9, 2024

I fully agree that the current structure is confusing. I can never remember where I find the individual parts. I suggest the following refactoring:

  • colors.py should be reduced to the non-colormapping topics, basically Other classes, Functions and Exported colors from https://matplotlib.org/stable/api/colors_api.html.

  • All colormapping-related functionality should be collected under one matplotlib.colormapping namespace. I believe it is too large to accomodate all aspects in a single namespace. That would get unwieldy both in the codebase and in the docs. I therefore suggest to make submodules for the relevant parts: matplotlib.colormapping.norms etc. To make life easier for users, all classes of subcomponents should be re-exported in the matplotlib.colormapping namespace, and we should recommend to use from matplotlib.colormapping import XYZ instead of importing from the submodules.

    Side note: At first I thought to make the submodules private, but now decided against that. We can be transparent about the substructure and that the importing is only a shortcut. The fully qualified class names contain the submodules anyway so we cannot completely hide that the classes are not defined in matplotlib.colormapping. Also, showing the substructure makes documentation easier: One can immediately see which types of components are involved and document them in their respective sub-module context. While not the place for a full documentation of the concepts, the matplotlib.colormapping docstring can give a short summary of the involved components and how they play together. The only disadvantage is that we make the submodules public API and thus harder to change. But they directly follow the logic structure so that I don't see that we would need to change the submodules unless we'd overhaul the complete colormapping architecture again.

How to get there:

  • Create the new structure and move all relevant code there.
  • colors.py will keep the basic non-mapping color functionality and re-export the colormapping classes that it contained so far for backward-compatibility.
  • cm.py will become completely legacy. New code should not need to import it anymore. It will also re-export the colormapping classes that it contained so far for backward-compatibility.

The backward-compatibility will be indefinite for now.

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 9, 2024

@timhoffm

I would just like to confirm that I understand you correctly, does this agree with what you have in mind?

  1. colors.py:
    1. Utility functions for working with colors
    • re-import Colormap and Normalize for backward-compatibility
  2. norms.py
    1. Normalize and subclasses of Normalize
  3. colorizer.py
    1. Colorizer (handles the data→norm→color pipeline)
  4. colormaps.py
    1. Colormap and subclasses of Colormap
    2. [with MultivarColormap and BivarColormap #28454 BivarColormap, MultivarColormap]
  5. colormapping.py:
    1. The colormap registry
    2. ColorizingArtist
    3. Imports the classes and functions from files 1.-4.
  • cm.py:
    • completely legacy

@timhoffm
Copy link
Member

timhoffm commented Aug 9, 2024

colormapping should be a package containing submodules, i.e.:

matplotlib/
    ...
    colors.py
    colormapping/
        __init__.py
        norms.py
        colorizer.py
        colormaps.py
        ...
    cm.py

I'm inclined to keep content out of colormpping/__init__.py. That means also ColorizingArtist should get its own submodule - or put ColorizingArtist and Colorizer into one submodule colorizer.py / colorizing.py (I've not thought about all the details yet). The ColormapRegistry may live in colormaps.py. We might consider moving colorbar.py into colormapping for consistency, but that can be decided later. It would be a complete move + re-export under the old module name for compatibility.

Edit: Should we use singular module names norm.py, colormap.py?

@trygvrad
Copy link
Contributor Author

trygvrad commented Aug 9, 2024

I see, then this should probably all wrap into #28658.


I would personally prefer plural names for the files that should include multiple classes in the same family:

from norms import Normalize, LogNorm
from colormaps import Colormap, LinearSegmentedColormap

vs:

from norm import Normalize, LogNorm
from colormap import Colormap, LinearSegmentedColormap

But colorizer.py can be singular, because there should not need to be multiple colorizer classes:

from colorizer import Colorizer

If you think this is too granular, then we do everything singular.

@story645
Copy link
Member

story645 commented Aug 9, 2024

One large namespace colormapping that includes all relevant components (Norm, Colormap, Colorizer, ColorizingArtist, ColormapRegisty)

Love the broad strokes, am worried that since we already associate colormap/colormapping with the colormaps and not the process, that also naming the process/process namespace colormapping may confuse the users who have learned our terminology. It would also mean we'd have to update docs to make clear that colormapping is now how we refer to the process.

What about keeping colors as the broad grouping and then move out all those utils functions into _colors.py? Basically colors gets elevated to the namespace? Mostly cause that's how we already group in docs:

ETA: what I mean is a structure like:

matplotlib/
    ...
    colors/
        __init__.py
        _colors.py
        norms.py
        colorizer.py
        colormaps.py
        ...
    cm.py

@trygvrad
Copy link
Contributor Author

I see your point @story645 but I find colors to be a bit too ambiguous of a name for what is much more specific.

I'm leaning towards colormapping being an OK name despite being a bit long, because it makes intuitive sense to import it in the following way:

from matplotlib import colormapping as mcm

and while colormapping is too long to be convienient as a namespace, I find mcm to be suitable.
On the other hand mcm makes for a bad filename, just as cm, so I would like to use colormapping as the name, and then importing it as mcm.

@story645
Copy link
Member

so I would like to use colormapping as the name, and then importing it as mcm

By convention, cm is imported as mcm 😓

A shorter alternative is coloring but my guess is that coloring and colors will confuse everyone.

And yeah, I'm somewhat neutral/am not gonna put up a fight on this b/c I get that you're trying to isolate this one very specific pipeline.

@timhoffm
Copy link
Member

I see the point in suggesting to collect everything under colors. It would also have the advantage that norms and colormap classes would not change their location.

These are strong advantages. However, I‘m feeling uneasy with the idea. It feels imbalanced: First, the non-mapping code is small compared to the mapping architecture. It‘s a bit lost. Second, I‘d like the public namespace be purely re-imports (also thinking in docs organization: These are the different parts documented in detail on the sub-pages). Adding documentation for the few non-mapping functions is imbalanced). OTOH it also does not make sense to move that documentation to a public subnamespace (matplotlib.colors.colors - or slightly better matplotlib.colors.utils).

Second thoughts: It might be possible to get the structure resolved by de-emphasizing the sub-namespaces and making the colors namespace docs an index similar to https://matplotlib.org/devdocs/api/axes_api.html#module-matplotlib.axes, where the non-mapping functions are one section and link to individual doc pages per function. But I‘d have to have a closer look at that.

@timhoffm
Copy link
Member

To keep things simple and better separated, I suggest to start with the separate colormapping namespace approach. Until release we can always move things around without too much effort (consciously risking that the approach we first choose may stay because it's good enough and we don't want to reorganize later). Anybody, please speak up if you disagree.

@trygvrad
Copy link
Contributor Author

So I made an attempt at implementing this.
The structure I was going for was this:

colormapping/
    __init__.py
    _colormap_data.py  # _cm.py
    _colormap_luts.py   # _cm_listed.py
    colorbar.py
    colorizer.py
    colormap_classes.py
    colormap_registry.py
    lighting.py
    norms.py
colors.py

where colormapping handles everything that goes $data\rightarrow color$ while color cycles (for line plots etc.) remain in colors.py

I found some drawbacks to this structure.

  1. It quickly leads to circular imports.
    1. This appears to be be related to colorizer.ColorizingArtist requires artist.py to be loaded, but artist.py cannot be loaded unitl rcparams are loaded, which is after the colormap registry (?). Therefore, you cannot load the entirety of colormapping in one go.
    2. This is also related to Colormap requiring to_rgba() from colors.py, and colors.py needing to import Colormap (for compatibility)
  2. from matplotlib.colormapping.norms import Normalize is a very long and cumbersome string to remember. from matplotlib.colormapping import Normalize would be better, and could be accessible if colormapping.__init__.py could import all the relevant classes functions from its members (much in the same way as axes/__init__.py imports Axes), but this is not possible because of issue 1.

At the core of this is the fact that different aspects of colors.py and cm.py are heavily interconnected.

While these are not insurmountable issues, I don't see a solution that does not rely on workarounds (i.e. calling imports in file A from file B to avoid circular imports), or an even more brutal refactoring/renaming operation.


I raised this issue because colors.py is difficult to parse and contains several different things that appear to be unrelated.
While I have found that making a submodule for the entire color mapping pipeline is difficult, there are some parts of color.py that can be easily separated.

I propose the following:

  1. Move Normalize and subclasses to _norms.py
  2. Move Colormap and subclasses to _colormap_classes.py
  3. In colors.py do:
from ._norms import \
    (Normalize, NoNorm, TwoSlopeNorm, CenteredNorm,  # noqa: F401
     FuncNorm, LogNorm, SymLogNorm, AsinhNorm,
     make_norm_from_scale, PowerNorm, BoundaryNorm)

from ._colormap_classes import \
    (Colormap, LinearSegmentedColormap, ListedColormap)  # noqa: F401

All access to the norms and colormaps still goes through colors.py, so there is no change to the API, but I believe it will be easier to parse and maintain the codebase.

Additionally this brings us closer to the structure @story645 suggested, with colors as a submodule, so that this can be developed further in a later PR.
It also structures the contents of colors.py, which improved docs can be built upon, i.e. if the docs for colors.py are changed to an index as @timhoffm suggested.

@jklymak
Copy link
Member

jklymak commented Aug 14, 2024

I'm only half following this, but I constantly need to remind myself where the norms are stashed. I wouldn't look for them under "color". At least now they are under "cm" which has "mapping" in the name.

I don't see what the problem with having a top-level norms.py is. While I agree that 99.99% of the time they are used for colormapping, they are just mathematical transforms.

@story645
Copy link
Member

I don't see what the problem with having a top-level norms.py is. While I agree that 99.99% of the time they are used for colormapping, they are just mathematical transforms

I agree here - I proposed colors as top level as an alternative to colormapping, but our docs emphasize norms as their own distinct thing enough that I'm cool with them being a top level module if that makes this all less of a hassle.

The rcParam issue can be solved via delayed evaluation til run time kinda like how we manage backends, but I seriously don't think it's worth it.

elif key == "backend" and self is globals().get("rcParams"):

@trygvrad
Copy link
Contributor Author

I'm only half following this, but I constantly need to remind myself where the norms are stashed. I wouldn't look for them under "color". At least now they are under "cm" which has "mapping" in the name.

I don't see what the problem with having a top-level norms.py is. While I agree that 99.99% of the time they are used for colormapping, they are just mathematical transforms.

I completely agree, both on norms being hard to find and that norms are their own thing.
To your first point, they are currently in colors.py, not cm.py, but I would strongly prefer them being in norms.py.

@timhoffm
Copy link
Member

I don't see what the problem with having a top-level norms.py is. While I agree that 99.99% of the time they are used for colormapping, they are just mathematical transforms.

The problem I see is that we currently have 60 subnamespaces. If we through the ~6 colormapping related namespaces flat in, it's harder to see that they play together. That's like throwing a jigsaw puzzle consisting of 6 pieces into a large bag with other pieces - you'll not find the pieces belonging together again.

In my eyes, the "just mathematical transform" is not an argument. While technical true, we clearly associate it with the colormapping. IMHO that's helps users build a mental model of the architecture. Also, I would not exclude that we might want to add aspects to norms in the future that are colormapping specific and break the "just mathematical transform" assumption. I also would strongly advise users not to use norms for other purposes than colormapping.

It's an architecture shortcoming if we cannot organize the module structure as we would like to due to technical reasons (dependencies). I still have to determine whether this indicates that our planned structure was not good, or whether we've issues in the current organization of code.
I would not want to exclude a structure, just because it leads to import problems right now. But this needs additional thoughts and I don't want to block the colormapping topic over it. I therefore approve @trygvrad's proposal to make _norms.py and _colormap_classes.py submodules in color.py (turning that into a package). It does not yet introduce any new structures for them. But it separates these logical entities out and will make it easier to move them into another place in the module structure later if needed.

@jklymak
Copy link
Member

jklymak commented Aug 15, 2024

The problem I see is that we currently have 60 subnamespaces.

As an end-user, I only use a very few of those 60 though - pyplot.py and a very small handful of others. I just checked, and I have a 20 figure paper worth of analysis where the only import I use is pyplot, and the figures are not trivial. A quick skim through the examples brings up very few instances where a user needs to import anything other than pyplot, which is a good thing in my opinion.

However, occasionally I will use a LogNorm, and every time I need to google what module it is in because colors.py does not evoke "norm" to me. I know the thing I need is a "norm", I don't think it's crazy to think that matplotlib.norm would be where I would find it. If one were to store them somewhere, and argue consistency, keeping the norms with colormaps would make the most sense since they are usually used together.

Ideally I'd just make norms available via pyplot or top-level matplotlib. However, persisting in keeping them in colors.py during a reorganization is not very user-friendly in my opion.

@timhoffm
Copy link
Member

Totally agree, it’s good you usually only need one or very few imports.

But if you want to know where to find something or how the library is structured, https://matplotlib.org/devdocs/api/index.html#modules is a canonical entry point.

However, persisting in keeping them in colors.py during a reorganization is not very user-friendly in my opinion.

That’s not the topic. What the current proposal does, is enabling @trygvrad to continue with the Colorizer changes without having to decide right now where to put norms eventually. There will be final module structure considerations before the official release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants