Encode sequences of satellite images using modern video compression like AV1 #45

JackKelly · 2021-10-25T16:13:03Z

Video compression has developed a lot over recent years (driven by Netflix etc.)

Our sequences of satellite images and NWPs can be considered video sequences. There's lots of redundant information across frames. So, if we wanted to squish the data down as much as possible (e.g. for sharing with students; or for regularly sending to Lancium; or just for archiving many years of data without breaking the bank) then we might want to consider using video compression like AV1 to compress our satellite data and/or NWP data.

ffmpeg supports AV1 encoding, including lossless, 10-bit, and 12-bit

And ffmpeg-python supports moving data between numpy arrays and ffmpeg.

If we really wanted to, we could probably write a numcodecs-like compression library to allow us to use ffmpeg to compress stuff, and still save into NetCDF / Zarr.

In terms of pre-prepared batches, it may be far easier to save each example as a standard video file (rather than trying to use AV1 within NetCDF... e.g. save as a sequence of TIFFs, and then ask ffmpeg to convert those TIFFs to a video file compressed using AV1). Which we can do now that we're saving each modality separately :)

This is not a priority, of course!

Twitter discussion.

The text was updated successfully, but these errors were encountered:

JackKelly · 2022-01-07T14:08:18Z

If we want to use video compression in our Zarrs then we might have to use Zarr chunks which span multiple timesteps.

If, instead, we want to compress each timestep independently, then AVIF might be worth looking at (which uses AV1 compression for still images).

JackKelly · 2022-01-07T14:16:29Z

Copying a Slack conversation @jacobbieker and I just had...

Jacob:

[For full-geospatial-extant Zarrs] It seems to be using around 44mb on average per timestep for HRV and non-HRV (30mb for non-HRV, and 14mb for HRV), which ends up being around 380GB per month, or 4.1TB per year, because 1 month of each year the RSS is shutdown. So quite a bit more data. So might be worth checking out higher compression

Me:

Cool, thanks, sounds good! TBH, my guess is that (slightly) lossy compression might be fine. Although, first, it'd be great to see how well "modern" lossless compression works. AV1 video compression might be interesting, although I think we'd then have to use Zarr chunks which span multiple timesteps.

Jacob:

Yeah, we could try maybe with 3 timesteps? It would limit the downside of loading lots of frames, but could still then compress quite a bit? Downside with saving it that way is I think I'd need to then also do the data processing in chunks, rather than each timestep being separate as it is now

Me:

ah, good point, I'd forgotten about that! hmmm... I'm really split... on the one hand, video compression should result in much smaller files (because consecutive frames are pretty similar). But, it also sounds like it might be a fair amount of work! As a quick and hacky test, it might be worth manually outputting a handful of frames as TIFFs, and then using ffmpeg to encode those TIFFs as a lossless AV1 video file, and seeing what the compression ratio is like?

Jacob:

Yeah, I think possibly benchmarking some more image compression ones first might be the easiest to try, otherwise, I can try doing it in chunks, if we can reduce the filesize, even by 10%, we'd then save around 4TB of storage or something like that

JackKelly · 2022-01-07T14:46:34Z

Yeah, so, I'd recommend trying AVIF first (the still-image version of AV1).

If AVIF is hard to implements, then slightly lossy 8-bit JPEGs might be worth a go. I don't know for sure, but I'm pretty sceptical that our models are currently benefiting from pristine 10-bit lossless imagery! 🙂

JackKelly · 2022-01-07T14:47:45Z

The python library imagecodecs supports AVIF.

JackKelly · 2022-01-07T14:53:04Z

And here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for AVIF?

jacobbieker · 2022-01-07T14:53:46Z

Thanks! I'll try those out

JackKelly · 2022-01-07T14:54:42Z

Awesome, thanks! Right, I'll stop procrastinating tax-related tasks now... 🙂

JackKelly · 2022-01-07T15:02:49Z

And here's a useful issue, including a short guide to creating codecs for Zarr: zarr-developers/numcodecs#73

cgohlke · 2022-01-07T16:19:39Z

FWIW, the imagecodecs library includes numcodecs compatible codecs. Register with imagecodecs.numcodecs.register_codecs(). It's all work in progress but good enough to experiment with.

jacobbieker · 2022-01-11T16:29:49Z

Using the jpeg2k, saving 3 channels as individual timesteps saves about 57%, when using level=100 compared to the zstd, the jpeg2k throws an error when encoding saving multiple timesteps though.

jacobbieker · 2022-01-13T16:18:18Z

Using bz2, which is what pbzip2 is based on, reduces the zarrs by 23%, while being lossless, and not needing special handling

jacobbieker · 2022-01-13T16:18:37Z

Higher levels for zstd didn't result in any real savings

jacobbieker · 2022-01-14T13:08:43Z

#47 might also be an easy win on compression. Tried AVIF but it doesn't do monochrome images, will try with 3 channel images soon.

jacobbieker · 2022-01-14T13:11:08Z

Fixing the dtype saves another 13% on the size when using bz2

jacobbieker · 2022-01-14T13:34:44Z

Fixing the dtype and using bz2 level 3 or 5 results in a 28% reduction in size compared to zstd

JackKelly transferred this issue from openclimatefix/nowcasting_dataset Jan 7, 2022

JackKelly added the enhancement New feature or request label Jan 7, 2022

JackKelly added this to NIA: WP2 Jan 7, 2022

JackKelly moved this to Todo in NIA: WP2 Jan 7, 2022

jacobbieker self-assigned this Jan 7, 2022

This was referenced Jan 7, 2022

Benchmark candidate intermediate file formats for EUMETSAT data #13

Closed

Compress NWP and satellite batches using AV1 or AVIF openclimatefix/nowcasting_dataset#571

Open

jacobbieker mentioned this issue Jan 18, 2022

Add support for Data Tailor #49

Merged

7 tasks

jacobbieker closed this as completed in #49 Jan 19, 2022

Repository owner moved this from Todo to Done in NIA: WP2 Jan 19, 2022

kasiaocf removed this from NIA: WP2 Jan 20, 2022

kasiaocf added this to Nowcasting Jan 20, 2022

kasiaocf moved this to Todo in Nowcasting Jan 20, 2022

kasiaocf moved this from Todo to Done in Nowcasting Jan 20, 2022

JackKelly mentioned this issue Jan 27, 2022

Experiment with using lossless JPEG-XL (colorspace=YUV 400) #67

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode sequences of satellite images using modern video compression like AV1 #45

Encode sequences of satellite images using modern video compression like AV1 #45

JackKelly commented Oct 25, 2021 •

edited

Loading

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

jacobbieker commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

cgohlke commented Jan 7, 2022

jacobbieker commented Jan 11, 2022

jacobbieker commented Jan 13, 2022

jacobbieker commented Jan 13, 2022

jacobbieker commented Jan 14, 2022

jacobbieker commented Jan 14, 2022

jacobbieker commented Jan 14, 2022

Encode sequences of satellite images using modern video compression like AV1 #45

Encode sequences of satellite images using modern video compression like AV1 #45

Comments

JackKelly commented Oct 25, 2021 • edited Loading

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

jacobbieker commented Jan 7, 2022

JackKelly commented Jan 7, 2022

JackKelly commented Jan 7, 2022

cgohlke commented Jan 7, 2022

jacobbieker commented Jan 11, 2022

jacobbieker commented Jan 13, 2022

jacobbieker commented Jan 13, 2022

jacobbieker commented Jan 14, 2022

jacobbieker commented Jan 14, 2022

jacobbieker commented Jan 14, 2022

JackKelly commented Oct 25, 2021 •

edited

Loading