JPEG-XL #80

JackKelly · 2022-02-14T19:45:25Z

Pull Request

Description

Use JPEG-XL to compress satellite Zarr images.

Fixes #66
Fixes #67

For satellite images:
- Rescales satellite image values to the range [0, 1] using float32.
- Under the hood, encode NaNs in satellite images as the value 0.025.
- Renamed "Compressor" to "ScaleToZeroToOne".
For cloud masks:
- Save as int8, with NaNs represented as -1 (I think the previous code saved as float32?)
Modified the README
Modified generate_test_plots.py
Renamed stacked_eumetsat_data to data (Issue Rename stacked_eumetsat_data to data; and rename variable to channels? #66)
ignored the downloaded image files in .gitignore
This PR does not implement a script to convert existing Zarrs to JPEG-XL. That will be done when issue Script to convert "old" Zarrs to JPEG-XL #81 is implemented.

I've changed the channels_chunk_size from 11 to 1 because the imagecodecs interface to JPEG-XL doesn't yet know how to decompress multiple images per JPEG-XL file (so it has to store each channel and each timestep as a separate file).

@jacobbieker This might also mean that we might want to consider using large image sizes (in pixels). If the compressed images are too small (much less than 1 MByte on average) then we might want to consider using larger image sizes, I'd guess.

How Has This Been Tested?

Yes, the test plots run (locally and on GitHub CI)

Checklist:

My code follows OCF's coding style guidelines
I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
I have checked my code and corrected any misspellings

…-XL now

review-notebook-app · 2022-02-14T19:45:29Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2022-02-14T19:48:26Z

Codecov Report

Merging #80 (a9e3d2f) into main (a4de145) will not change coverage.
The diff coverage is 0.00%.

@@          Coverage Diff          @@
##            main     #80   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files          8       9    +1     
  Lines        679     710   +31     
=====================================
- Misses       679     710   +31

Impacted Files	Coverage Δ
satip/intermediate.py	`0.00% <ø> (ø)`
satip/jpeg_xl_future.py	`0.00% <0.00%> (ø)`
satip/scale_to_zero_to_one.py	`0.00% <0.00%> (ø)`
satip/utils.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4de145...a9e3d2f. Read the comment docs.

JackKelly · 2022-02-15T12:36:21Z

satip/utils.py

-        dataarray = dataarray.round().clip(min=0, max=3).astype(np.int8)
-        dataarray.attrs = serialize_attrs(dataarray.attrs)
-        # Convert 3's to NaNs as they should be No Data/Space
-        dataarray = dataarray.where(dataarray["variable"] != 3)


I think this line implicitly converted the cloud masks back to float64 (because int8 can't represent NaNs)?

Ah, possibly, the cloud masks, when I open them up, are int8, so possibly something else is happening to the NaNs

JackKelly · 2022-02-15T12:37:58Z

satip/jpeg_xl_float_with_nans.py

@@ -0,0 +1,182 @@
+"""Thin wrapper around imagecodecs.JpegXl.


We could consider moving this file into a separate repository? Maybe it's fine in satip for now?

I think its fine here, if this is the only place we are using JPEGXL for now

JackKelly · 2022-02-15T12:43:10Z

Sorry, I should've said: I haven't tried running convert_native_to_zarr.py because I didn't want to disrupt the currently running jobs on leonardo!

But generate_test_plots.py runs successfully!

jacobbieker · 2022-02-15T12:52:21Z

Sorry, I should've said: I haven't tried running convert_native_to_zarr.py because I didn't want to disrupt the currently running jobs on leonardo!

But generate_test_plots.py runs successfully!

No worries! If the generate test plot runs, then it does work end to end, the test plots uses all the same steps as the other scripts, so it should be fine!

jacobbieker

LGTM! Great work, thanks!

jacobbieker · 2022-02-15T13:07:03Z

Might merge this now, just to make it easier working on the conversion script

JackKelly and others added 7 commits January 28, 2022 14:43

adding JPEG-XL.ipynb. Doesn't work yet.

471fb29

making progress with JPEG-XL

209bdfc

slight update

d63cb5e

lots of performance results about jpeg-xl

ac2b543

I think I'm happy with how to compress our intermediate Zarrs as JPEG…

6dff00e

…-XL now

Merge branch 'main' into jack/jpeg-xl

f47ae6f

Write new Zarrs as JPEG-XL

c3e4c27

JackKelly added 10 commits February 14, 2022 20:01

hopefully fix test plots

8defc40

oops, fix wrong import of JpegXl

67eeabb

separately encode NaNs for satellite images vs masks

fe20248

Oops, remove the examples dimension

506e3e6

use issubclass not isinstance

0266ac7

remove dtype from calling code. Convert satellite data to float32

807ea52

convert dataarray to dataset before saving

1407539

convert dataarray to dataset before saving

e983566

call compress_mask from within load_cloudmask_to_dataset

a9e3d2f

test script runs!

acbabab

JackKelly self-assigned this Feb 15, 2022

Fix comments

5d57090

JackKelly commented Feb 15, 2022

View reviewed changes

JackKelly marked this pull request as ready for review February 15, 2022 12:38

JackKelly requested a review from jacobbieker February 15, 2022 12:38

jacobbieker approved these changes Feb 15, 2022

View reviewed changes

jacobbieker merged commit 509592a into main Feb 15, 2022

jacobbieker deleted the jack/jpeg-xl branch February 15, 2022 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JPEG-XL #80

JPEG-XL #80

JackKelly commented Feb 14, 2022 •

edited

Loading

review-notebook-app bot commented Feb 14, 2022

codecov-commenter commented Feb 14, 2022 •

edited

Loading

JackKelly Feb 15, 2022

jacobbieker Feb 15, 2022

JackKelly Feb 15, 2022

jacobbieker Feb 15, 2022

JackKelly commented Feb 15, 2022 •

edited

Loading

jacobbieker commented Feb 15, 2022

jacobbieker left a comment

jacobbieker commented Feb 15, 2022

JPEG-XL #80

JPEG-XL #80

Conversation

JackKelly commented Feb 14, 2022 • edited Loading

Pull Request

Description

How Has This Been Tested?

Checklist:

review-notebook-app bot commented Feb 14, 2022

codecov-commenter commented Feb 14, 2022 • edited Loading

Codecov Report

JackKelly Feb 15, 2022

Choose a reason for hiding this comment

jacobbieker Feb 15, 2022

Choose a reason for hiding this comment

JackKelly Feb 15, 2022

Choose a reason for hiding this comment

jacobbieker Feb 15, 2022

Choose a reason for hiding this comment

JackKelly commented Feb 15, 2022 • edited Loading

jacobbieker commented Feb 15, 2022

jacobbieker left a comment

Choose a reason for hiding this comment

jacobbieker commented Feb 15, 2022

JackKelly commented Feb 14, 2022 •

edited

Loading

codecov-commenter commented Feb 14, 2022 •

edited

Loading

JackKelly commented Feb 15, 2022 •

edited

Loading