Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG-XL #80

Merged
merged 18 commits into from
Feb 15, 2022
Merged

JPEG-XL #80

merged 18 commits into from
Feb 15, 2022

Conversation

JackKelly
Copy link
Member

@JackKelly JackKelly commented Feb 14, 2022

Pull Request

Description

Use JPEG-XL to compress satellite Zarr images.

Fixes #66
Fixes #67

  • For satellite images:
    • Rescales satellite image values to the range [0, 1] using float32.
    • Under the hood, encode NaNs in satellite images as the value 0.025.
    • Renamed "Compressor" to "ScaleToZeroToOne".
  • For cloud masks:
    • Save as int8, with NaNs represented as -1 (I think the previous code saved as float32?)
  • Modified the README
  • Modified generate_test_plots.py
  • Renamed stacked_eumetsat_data to data (Issue Rename stacked_eumetsat_data to data; and rename variable to channels? #66)
  • ignored the downloaded image files in .gitignore
  • This PR does not implement a script to convert existing Zarrs to JPEG-XL. That will be done when issue Script to convert "old" Zarrs to JPEG-XL #81 is implemented.

I've changed the channels_chunk_size from 11 to 1 because the imagecodecs interface to JPEG-XL doesn't yet know how to decompress multiple images per JPEG-XL file (so it has to store each channel and each timestep as a separate file).

@jacobbieker This might also mean that we might want to consider using large image sizes (in pixels). If the compressed images are too small (much less than 1 MByte on average) then we might want to consider using larger image sizes, I'd guess.

How Has This Been Tested?

  • Yes, the test plots run (locally and on GitHub CI)

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2022

Codecov Report

Merging #80 (a9e3d2f) into main (a4de145) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main     #80   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files          8       9    +1     
  Lines        679     710   +31     
=====================================
- Misses       679     710   +31     
Impacted Files Coverage Δ
satip/intermediate.py 0.00% <ø> (ø)
satip/jpeg_xl_future.py 0.00% <0.00%> (ø)
satip/scale_to_zero_to_one.py 0.00% <0.00%> (ø)
satip/utils.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4de145...a9e3d2f. Read the comment docs.

@JackKelly JackKelly self-assigned this Feb 15, 2022
dataarray = dataarray.round().clip(min=0, max=3).astype(np.int8)
dataarray.attrs = serialize_attrs(dataarray.attrs)
# Convert 3's to NaNs as they should be No Data/Space
dataarray = dataarray.where(dataarray["variable"] != 3)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line implicitly converted the cloud masks back to float64 (because int8 can't represent NaNs)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, possibly, the cloud masks, when I open them up, are int8, so possibly something else is happening to the NaNs

@@ -0,0 +1,182 @@
"""Thin wrapper around imagecodecs.JpegXl.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider moving this file into a separate repository? Maybe it's fine in satip for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine here, if this is the only place we are using JPEGXL for now

@JackKelly JackKelly marked this pull request as ready for review February 15, 2022 12:38
@JackKelly
Copy link
Member Author

JackKelly commented Feb 15, 2022

Sorry, I should've said: I haven't tried running convert_native_to_zarr.py because I didn't want to disrupt the currently running jobs on leonardo!

But generate_test_plots.py runs successfully!

@jacobbieker
Copy link
Member

Sorry, I should've said: I haven't tried running convert_native_to_zarr.py because I didn't want to disrupt the currently running jobs on leonardo!

But generate_test_plots.py runs successfully!

No worries! If the generate test plot runs, then it does work end to end, the test plots uses all the same steps as the other scripts, so it should be fine!

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work, thanks!

@jacobbieker
Copy link
Member

Might merge this now, just to make it easier working on the conversion script

@jacobbieker jacobbieker merged commit 509592a into main Feb 15, 2022
@jacobbieker jacobbieker deleted the jack/jpeg-xl branch February 15, 2022 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
3 participants