-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode sequences of satellite images using modern video compression like AV1 #45
Comments
If we want to use video compression in our Zarrs then we might have to use Zarr chunks which span multiple timesteps. If, instead, we want to compress each timestep independently, then AVIF might be worth looking at (which uses AV1 compression for still images). |
Copying a Slack conversation @jacobbieker and I just had... Jacob:
Me:
Jacob:
Me:
Jacob:
|
Yeah, so, I'd recommend trying AVIF first (the still-image version of AV1). If AVIF is hard to implements, then slightly lossy 8-bit JPEGs might be worth a go. I don't know for sure, but I'm pretty sceptical that our models are currently benefiting from pristine 10-bit lossless imagery! 🙂 |
The python library imagecodecs supports AVIF. |
And here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for AVIF? |
Thanks! I'll try those out |
Awesome, thanks! Right, I'll stop procrastinating tax-related tasks now... 🙂 |
And here's a useful issue, including a short guide to creating codecs for Zarr: zarr-developers/numcodecs#73 |
FWIW, the imagecodecs library includes numcodecs compatible codecs. Register with |
Using the jpeg2k, saving 3 channels as individual timesteps saves about 57%, when using |
Using bz2, which is what pbzip2 is based on, reduces the zarrs by 23%, while being lossless, and not needing special handling |
Higher levels for zstd didn't result in any real savings |
#47 might also be an easy win on compression. Tried AVIF but it doesn't do monochrome images, will try with 3 channel images soon. |
Fixing the dtype saves another 13% on the size when using bz2 |
Fixing the dtype and using bz2 level 3 or 5 results in a 28% reduction in size compared to zstd |
Video compression has developed a lot over recent years (driven by Netflix etc.)
Our sequences of satellite images and NWPs can be considered video sequences. There's lots of redundant information across frames. So, if we wanted to squish the data down as much as possible (e.g. for sharing with students; or for regularly sending to Lancium; or just for archiving many years of data without breaking the bank) then we might want to consider using video compression like AV1 to compress our satellite data and/or NWP data.
ffmpeg supports AV1 encoding, including lossless, 10-bit, and 12-bit
And ffmpeg-python supports moving data between numpy arrays and ffmpeg.
If we really wanted to, we could probably write a
numcodecs
-like compression library to allow us to use ffmpeg to compress stuff, and still save into NetCDF / Zarr.In terms of pre-prepared batches, it may be far easier to save each example as a standard video file (rather than trying to use AV1 within NetCDF... e.g. save as a sequence of TIFFs, and then ask ffmpeg to convert those TIFFs to a video file compressed using AV1). Which we can do now that we're saving each modality separately :)
This is not a priority, of course!
Twitter discussion.
The text was updated successfully, but these errors were encountered: