Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data dimensionality and axes metadata #35

Closed
constantinpape opened this issue Feb 28, 2021 · 43 comments
Closed

Data dimensionality and axes metadata #35

constantinpape opened this issue Feb 28, 2021 · 43 comments
Assignees

Comments

@constantinpape
Copy link
Contributor

In last weeks meeting the question of data dimensionality came up again (in the morning it was raised by @jni, and I think it came up in the afternoon as well).
Currently, the spec demands that all data is 5 dimensional (I think with axis order TCXYZ, but I am not quite sure).

Do we want to lift the restriction and allow data of lower dimensionality? In this case, we would add metadata in multiscales to describe the axes (e.g. "axes": ["x", "y", "z"]).

Note that this is also important for the transformation spec #28, where we need to clarify which axes a transformation applies to.

Independent of the decisions, we should add a field that describes the physical units of the axes, e.g. "units": ["micrometer", "micrometer", "micrometer"].

@jni
Copy link

jni commented Mar 1, 2021

(I think with axis order TCXYZ, but I am not quite sure).

TCZYX. ;)

Do we want to lift the restriction and allow data of lower dimensionality?

Yes please! And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

@joshmoore
Copy link
Member

@jni: what behavior would you expect for an array with no x or y?

@tischi
Copy link

tischi commented Mar 2, 2021

Based on how this discussion evolved: #28 I guess the axis names may be part of the specification of the transformation from data space to physical space, is it?

@tischi
Copy link

tischi commented Mar 2, 2021

what behavior would you expect for an array with no x or y?

@joshmoore What do you mean by "behavior", maybe "how it would be rendered in a viewer"?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 2, 2021

@jni: what behavior would you expect for an array with no x or y?

In my opinion, a generic image viewer should have no intrinsic opinion about the particular axis names of the data it displays. If the user has 2D data with axes labelled X and B, then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates) as an image with one axis labelled "X" and the other axis labelled "B". If the data axis labelled "X" happens to be mapped to a display axis also called "X", then that is just a happy coincidence. A general-purpose data visualization tool should not assign any "meaning" to an axis name like "X" or "T". A more specialized tool might have an opinion about axis names, though.

@tischi
Copy link

tischi commented Mar 3, 2021

then the viewer should display the data (with a default, but overrideable, mapping from data coordinates to viewer coordinates)

The way I interpreted the status of our discussion at #28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 3, 2021

Ah, sorry for causing confusion (and maybe we are straying away from the original question @joshmoore posed) -- Yes, I have the same interpretation of the discussion in #28. My (confusingly stated) point in the comment above was just that general purpose data visualization tools shouldn't have an opinion / preference for specific axis names in the transform metadata.

@constantinpape
Copy link
Contributor Author

The way I interpreted the status of our discussion at #28 is that there is no default mapping, but a mapping must be always provided, or did I get this wrong?

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space.
This has the advantage that it's a non-breaking change.

@joshmoore what do you think about allowing to save also 2d, 3d and 4d data. I think this is the first important decision to drive #28 (and probably also other discussions) forward.

@tischi
Copy link

tischi commented Mar 4, 2021

And in fact the axis names should be whatever, I we should not be limited to subsets of "TCZYX". eg could be ["lat", "lon"] or ["left-right", "superior-inferior", "anterior-posterior"].

@jni based on state of the discussion in #28 I wonder now whether your comment is about axis names in data space or in physical space. Currently, I would think we simply have no axis names at all in data space. In physical space I think it is nice to know which axis should be the "x" axis such that the viewer can display the data accordingly. Thus I think this information should be there.

What we could think of, on top of the specification which on is the "x" axis, to have something like optional axis_names metadata:

"axis_names" : { "x" : "anterior-posterior", "y": "dorsal-ventral" }

Would that work for you?

@tischi
Copy link

tischi commented Mar 4, 2021

I think this is still up for discussion. @axtimwalde made the point that no transformation could just be interpreted as identity transform. And no axes labels would mean that the data stays in pixel space.

I think I'd prefer that it is required to specify the axes labels, because in practice it makes a big difference whether one displays a 3D data as xyz or xyc 😉 Unless we agree that specifying nothing defaults to axes of "type" : "space" with some default order like xyz.

@jni
Copy link

jni commented Mar 5, 2021

@tischi as mentioned on #28 we do not want to prescribe here where physical axes go on the screen. There is a third space, which is the screen space, and all kinds of transformations can happen between physical/world space and screen space, not least of which is a 3D -> 2D projection.

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata. As mentioned by others, requirement makes the spec not backward-compatible. Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels. (napari notably doesn't 😅 but we are definitely planning it!)

@tischi
Copy link

tischi commented Mar 5, 2021

I also don't think axis label specification should be a requirement, but a strongly encouraged metadata

OK, I guess I could live with "strongly encouraged" 😉

@tischi
Copy link

tischi commented Mar 5, 2021

Indeed, treating channels as spatial by default is fine: most viewers have the ability to separate out channels.

@jni I get the point about requirements and backwards compatibility. But, in practice, let's say the vision is to be able to chain a set of napari plugins into an image processing workflow. My feeling is that it may be necessary to require to know which axes are spatial and which axis is the channel axis. What do you think?

@joshmoore
Copy link
Member

#35 (comment) @joshmoore what do you think about allowing to save also 2d, 3d and 4d data.

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

@joshmoore
Copy link
Member

joshmoore commented Mar 5, 2021

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

cf. (har) http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#coordinate-types

Edit: ah, I see while working through issues that this also came up in #28 (comment)

@constantinpape
Copy link
Contributor Author

I've been working under the assumption that it would eventually be necessary (cf. the IMS file structure). It certainly has the potential to complicate and possibly slow-down implementations, so I'd just urge balancing how soon its introduced against immediate need.

As this is quite a big change and has implications for other parts of the spec, I would argue that this change should be done sooner than later if deemed necessary.
For example I am pretty sure that the transformation spec will look different if we decide on fixed 5d or 2d, 3d, 4d, 5d/

On the topic of XYZ or not necessarily XYZ, I have some concern that not having these takes us outside the realm of OME-* specs and closer to underlying numpy/zarr/n5/etc. specs, which is fine, but is something we should consider. If the axes are named arbitrarily, then quite possibly the axes metadata SHOULD additionally define which are orthogonal to one another and in what right-handed order

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

@joshmoore
Copy link
Member

I personally also think we shouldn't allow for arbitrary axis naming and stick to XYZCT.

To be clear, I can certainly imagine having additional axes. But if there is no traditional X, Y, or Z axes in a given zarray, I don't know if I would consider it an image in the sense that is currently defined in this repository. (If anyone has a counter-example, I'd love to hear it.)

@d-v-b
Copy link
Contributor

d-v-b commented Mar 5, 2021

Medical imaging often uses anatomical coordinates, which do not involve the letters "X", "Y", or "Z": https://www.slicer.org/wiki/Coordinate_systems

@joshmoore
Copy link
Member

@d-v-b: I guess I'm less concerned with naming, that's "just metadata". ;) But in all three you are in a 3D, right-handed coordinate system, right? I guess in my head (forgive me if I'm being biased) the ALS and IJK coordinate systems from slider.org could be equated to XYZ and then one need just provide which system one is under.

For comparison, in the high-content screening case, there are rows and plates but there's additionally metadata to say that the rows are letters and the columns are numbers.

@tischi
Copy link

tischi commented Mar 5, 2021

I think the axes metadata part of this issue became quite overlapping with the discussion in this issue: #28, where the last posts were also about the handedness of the coordinate system and how much we want to commit to x, y, and z. Could it therefore make sense to continue this discussion on axes metadata in #28 and here just discuss how many data dimensions we would like to support?

@axtimwalde
Copy link

A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

@glyg
Copy link
Contributor

glyg commented Mar 7, 2021

As a concrete case of more-than-5-D data, a team here is developing polarization microscopy, so each pixel has 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time. Of course you can store the polarization as channels, but it gets tricky to encode a transformation then, as for example a rotation needs to apply to both the spatial and polarization coordinates.

@constantinpape constantinpape self-assigned this Mar 7, 2021
@constantinpape
Copy link
Contributor Author

Ok, so I think dropping the requirement for 5d is not really controversial, whereas there's still some discussion about the axes labels.

I have been thinking a bit about how to drive the spec forward, and I think it would make most sense to start with a rather small change:

  • Move the metadata spec from the zarr-specs/issues to the ngff spec, as that's a prerequisite for any changes to the metadata spec.
  • Lift the restriction to 5d.

What do you think @joshmoore? I can start working on this.

@joshmoore
Copy link
Member

#35 (comment) A data format that supports only 5 dimensions is asking to be obsolete within 2 weeks ;).

I want this on a 👕 😉

#35 (comment) 7 coordinates: 3 spatial, the 3 components of the polarization vector, and time

How would you optionally encode them?

#35 (comment) What do you think @joshmoore? I can start working on this.

💯

@glyg
Copy link
Contributor

glyg commented Mar 8, 2021

How would you optionally encode them?

{
    "axes": ["x", "y", "z", "rho", "theta", "phi", "t"],
    "units": ["micrometer", "micrometer", "micrometer", "radians", "radians", "radians"]
}

@tischi
Copy link

tischi commented Mar 8, 2021

@glyg That's an interesting use case! As mentioned above, I think this may be quite overlapping with #28 where we discuss how to map from data space (no units, just dimensions) to physical space (e.g. spatial or possibly angles). So maybe it could be useful to look at this issue and maybe re-post your example there.

@constantinpape
Copy link
Contributor Author

I have proposed some initial changes in #39 to lift the 5d requirement, but otherwise did not change anything w.r.t. the current spec.
I will try to summarise the discussion here soon to see how to continue after #39 gets merged.

@constantinpape
Copy link
Contributor Author

constantinpape commented Mar 16, 2021

#39 now introduces axes as a MUST field in multiscales and allows up to 5 dimensions, with values for axes restricted to x, y, z, c, t. This change will be breaking with 0.1 and in the reviews @joshmoore remarked that it would a good idea to see if any of the potential changes we discussed here would be breaking with the (proposed) 0.2 again.

To summarize, I think we have discussed the following possible changes (relative to 0.2):

  • Allow more than 5 dimensions.
  • Allow arbitrary names in axes instead of just x, y, z, c, t
  • Add another field units to specify the physical dimension for each axis (side note: going back to the discussion in Transformation Specification #28 its unclear if this is necessary here or only in the transformation)

As far as I can see none of these changes would be breaking with the 0.2 proposal.
Anything I forgot here? Can anybody see issues with 0.2 that would require a breaking change in the future?

@k-dominik
Copy link

k-dominik commented Apr 19, 2021

Hi - adding in a few cents here as well...

When I was reading it, I was thinking about what viewers would like best. I think this issue/discussion should allow a complete newcomer to design a super simple viewer, that enables rudimentary viewing of all data that claims to be ngff. One of the reasons people still go around using pngs, jpgs, tifs and the likes is that they can view them with their system image viewer, by simply drag and drop. Ever tried this with an hdf5 with the de-facto image viewer of the bioimage community - Fiji?! No dice. When the outcome of this discussion here is, we allow arbitrary data with arbitrary axes, then this is as good as doing nothing. No new developer will be able to come up with a viewer that makes sense based on the specification. I think this encourages fragmentation. No one would be able to "understand" the data. With a fixed, limited set of axes in the data/pixel/image/voxel space you could truly have a format that all viewers could support, where looking into the image space will look more or less the same in all. Isn't this one of the goals?

The semantic meaning of the axes and units and the likes can be handled by smarter viewers: depending on the application they might use the transformation (as discussed in #28).

@k-dominik
Copy link

Adding to the comment above: I think some axes should have fixed meaning and name: tzyx, the rest could be handled as channels by "naive" consumers, whereas applications, closer to the data can handle those in a specialized way.

@joshmoore
Copy link
Member

See the new PR at #46

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-early-september-2021/55333/14

@constantinpape
Copy link
Contributor Author

To summarize the current state:

  • Since v0.3 we allow 2 to 5d data and have the axes field, which labels each dimension and has allowed values tczyx (redundancy not allowed!)

I think it's straightforward to also add an optional field units with the same length as axes and this can be done in one of the next versions.

In addition, I can see two more controversial potential changes that lift the restrictions above:

I am personally more in favor of keeping the spec more restrictive, but we need to see if there are some important use-cases that cannot be covered with the current spec. This is also very relevant for the issue of specifying transformations.

@constantinpape
Copy link
Contributor Author

Note also the proposal by @bogovicj and @axtimwalde here, which introduces a label, type and unit per dimension with a list of objects (=map/dict).
This diverges a bit from our current solution of having axes as list.
But it would be easy to have an equivalent solution using 3 lists, e.g. axes, axes_label and unit.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 2, 2021

Once you've specified a unit (assuming it's an SI unit), you have basically already specified the axis type, no? So it seems like axis_type is unnecessary (and potentially confusing, if someone accidentally does something like {axis_type : time, unit: nm}

@bogovicj
Copy link
Contributor

bogovicj commented Sep 2, 2021

The below was discussed in the ngff meeting on 01 Sept 2020

A counter example might be channels acquired at different wavelengths (physical unit), which clashes with spatial domain.
Ideas:

  • Use a more general way of describing the domain that can describe a categorical / discrete axis
    • how do we spec this? we should brainstorm
  • Use spatial frequency units instead of wavelength?
    • i can imagine users and microscope vendors not liking this

@tischi
Copy link

tischi commented Sep 6, 2021

Maybe the word channel is anyway a bit misleading? Maybe setup like in the BDV file format is more appropriate. For example, we sometimes acquire the exact same fluorescence "channel" in terms of emission wavelengths, but with a couple of different exposure times to accommodate for different sample brightness. Another example is to acquire the same emission wavelength but with different exposure wavelengths for some of the ratiometric sensor fluorophores. Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

@constantinpape
Copy link
Contributor Author

Follow up from last week's ngff meeting: there was fairly broad consensus that the axes label should be decoupled from the semantic meaning and in consequence a new field for the "semantic" axes type (time, space, channel (or similar, see comment by @tischi above). In addition, we want to add unit, which has some relation to type (e.g. type: time, unit: meter doesn't make sense, but there is not a strict one-to-one correspondence as @bogovicj pointed out above).
There was some additional discussions about allowing more than 5 dimensions and adding more axes types. My personal preference would be to not include these changes now, but rather make sure that the current changes allow extensibility to allow work on this in later versions.

I will start to work on spec v0.4 now and begin by making a PR for the changes laid out above; I will implement the solution that seems best to my judgment and try to lay out all discussion points I can see in the PR. We will announce once the PR is ready to be discussed on github and on image.sc.

@thewtex
Copy link
Contributor

thewtex commented Sep 7, 2021

Thus associating "channel" very strictly with the emission wavelength band is maybe too limiting?

component could also be considered -- it is semantically more general but has the same non-space-time association, and it also starts with a c :-)

@unidesigner
Copy link

HI @constantinpape et al. Just wanted to make you aware of some of discussion around axes metadata in this neuroglancer issue. It'd be good to know how some of the discussions therein could be fed into the discussion/proposal process for the ome-ngff specs on axes metadata.

@satra
Copy link

satra commented Sep 25, 2021

as a slight aside: regarding units as text we have found this text representation quite useful: https://people.csail.mit.edu/jaffer/MIXF/CMIXF-12 and we adopted this in the BIDS standard (https://bids-specification.readthedocs.io/en/stable/99-appendices/05-units.html). here is a python library to support parsing: https://github.com/sensein/cmixf

@constantinpape
Copy link
Contributor Author

I have started to put something together for the new axes metadata based on the discussions here in #57.
I am now working on transformations and will start a broader call for feedback once both proposals are done (given that these are linked), but feel free to comment on the axes metadata proposal already.

@constantinpape
Copy link
Contributor Author

This is now implemented with v0.4 :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests