Skip to content

Experiment with using lossless JPEG-XL (colorspace=YUV 400) #67

Closed
@JackKelly

Description

Detailed Description

JPEG-XL is the "new kid on the block" for image compression. And it can losslessly compress greyscale images (using colorspace YUV 400). It might be great for compressing satellite images into our Zarr datasets.

Context

See issue #45 for previous experiments and notes.

This great comparison of lossless compression using JPEG-XL, WebP, AVIF, and PNG suggests JPEG-XL wins.

Possible Implementation

imagecodecs includes an interface to JPEG-XL.

Next step is to try manually compressing images using the cjxl app (not ImageMagick).

If that looks good then create a stand-alone little adapter library to adapt imagecodecs to be used with Zarr. Here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for JPEG-XL? UPDATE: @cgohike has already implemented this in imagecodecs! (See comment below)

To use ImageMagick for quick experiments at the commandline:

You need ImageMagick version > 7.0.10 to use JPEG-XL.

To install ImageMagick >7.0.10:

sudo apt-get install imagemagick libmagick++-dev 

See here for how to install ImageMagick from source.

Then use 'magick' command not 'convert'.

I'll do some experiments later today or tomorrow 🙂

TODO

  • investigate whether the JPEG-XL lossly uint8 images are fine as is. It looks great visually. And 4 timesteps of UK HRV is only 0.6 MB (compared to 2.9 MB with bzip2; and 2.2 MB with JPEG-XL lossless uint16).
  • investigate whether we can use different color profiles. See these docs.
  • investigate if imagecodecs JPEG-XL can simply be installed through pip (or does it require libjxl to be manually installed first?) If it requires manual install then that maybe makes it inappropriate for a dataset that we might release publicly?
  • to get to 8bits, divide by 4 AND ROUND
  • Try the ideas Jon suggested on the JPEG-XL github issue queue (especially putting all 12 channels into a single JPEG-XL, using lossless compression)
  • See if it's possible to change the suffix of each Zarr chunk to .jxl (can't see how to do this and, anyway, Chrome cannot currently open jxl files)
  • Check output is the same (or roughly the same) as the input: plot gamma curve; compute MSE; etc.
  • try using alpha channel for NaNs. (haven't tried this but not going to bother because we can just use float16)
  • try float16 for saving NaNs. Yup, float16 saves NaNs, and there's no gamma curve. Need to map values to the range [0, 1].
  • try float32 with jpeg-xl
  • measure decompression speed of jpeg-xl vs gzip2
  • prepare a PR for Satip for using jpeg-xl

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions