Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode sequences of satellite images using modern video compression like AV1 #45

Closed
JackKelly opened this issue Oct 25, 2021 · 15 comments · Fixed by #49
Closed

Encode sequences of satellite images using modern video compression like AV1 #45

JackKelly opened this issue Oct 25, 2021 · 15 comments · Fixed by #49
Assignees
Labels
enhancement New feature or request

Comments

@JackKelly
Copy link
Member

JackKelly commented Oct 25, 2021

Video compression has developed a lot over recent years (driven by Netflix etc.)

Our sequences of satellite images and NWPs can be considered video sequences. There's lots of redundant information across frames. So, if we wanted to squish the data down as much as possible (e.g. for sharing with students; or for regularly sending to Lancium; or just for archiving many years of data without breaking the bank) then we might want to consider using video compression like AV1 to compress our satellite data and/or NWP data.

ffmpeg supports AV1 encoding, including lossless, 10-bit, and 12-bit

And ffmpeg-python supports moving data between numpy arrays and ffmpeg.

If we really wanted to, we could probably write a numcodecs-like compression library to allow us to use ffmpeg to compress stuff, and still save into NetCDF / Zarr.

In terms of pre-prepared batches, it may be far easier to save each example as a standard video file (rather than trying to use AV1 within NetCDF... e.g. save as a sequence of TIFFs, and then ask ffmpeg to convert those TIFFs to a video file compressed using AV1). Which we can do now that we're saving each modality separately :)

This is not a priority, of course!

Twitter discussion.

@JackKelly JackKelly transferred this issue from openclimatefix/nowcasting_dataset Jan 7, 2022
@JackKelly
Copy link
Member Author

If we want to use video compression in our Zarrs then we might have to use Zarr chunks which span multiple timesteps.

If, instead, we want to compress each timestep independently, then AVIF might be worth looking at (which uses AV1 compression for still images).

@JackKelly
Copy link
Member Author

Copying a Slack conversation @jacobbieker and I just had...

Jacob:

[For full-geospatial-extant Zarrs] It seems to be using around 44mb on average per timestep for HRV and non-HRV (30mb for non-HRV, and 14mb for HRV), which ends up being around 380GB per month, or 4.1TB per year, because 1 month of each year the RSS is shutdown. So quite a bit more data. So might be worth checking out higher compression

Me:

Cool, thanks, sounds good! TBH, my guess is that (slightly) lossy compression might be fine. Although, first, it'd be great to see how well "modern" lossless compression works. AV1 video compression might be interesting, although I think we'd then have to use Zarr chunks which span multiple timesteps.

Jacob:

Yeah, we could try maybe with 3 timesteps? It would limit the downside of loading lots of frames, but could still then compress quite a bit? Downside with saving it that way is I think I'd need to then also do the data processing in chunks, rather than each timestep being separate as it is now

Me:

ah, good point, I'd forgotten about that! hmmm... I'm really split... on the one hand, video compression should result in much smaller files (because consecutive frames are pretty similar). But, it also sounds like it might be a fair amount of work! As a quick and hacky test, it might be worth manually outputting a handful of frames as TIFFs, and then using ffmpeg to encode those TIFFs as a lossless AV1 video file, and seeing what the compression ratio is like?

Jacob:

Yeah, I think possibly benchmarking some more image compression ones first might be the easiest to try, otherwise, I can try doing it in chunks, if we can reduce the filesize, even by 10%, we'd then save around 4TB of storage or something like that

@JackKelly JackKelly added the enhancement New feature or request label Jan 7, 2022
@JackKelly JackKelly moved this to Todo in NIA: WP2 Jan 7, 2022
@jacobbieker jacobbieker self-assigned this Jan 7, 2022
@JackKelly
Copy link
Member Author

Yeah, so, I'd recommend trying AVIF first (the still-image version of AV1).

If AVIF is hard to implements, then slightly lossy 8-bit JPEGs might be worth a go. I don't know for sure, but I'm pretty sceptical that our models are currently benefiting from pristine 10-bit lossless imagery! 🙂

@JackKelly
Copy link
Member Author

The python library imagecodecs supports AVIF.

@JackKelly
Copy link
Member Author

And here's a super-simple little python library (just 51 lines of code!) which enables jpeg-2000 compression in Zarr using imagecodecs. Maybe it'd be possible to use the same pattern, but for AVIF?

@jacobbieker
Copy link
Member

Thanks! I'll try those out

@JackKelly
Copy link
Member Author

Awesome, thanks! Right, I'll stop procrastinating tax-related tasks now... 🙂

@JackKelly
Copy link
Member Author

And here's a useful issue, including a short guide to creating codecs for Zarr: zarr-developers/numcodecs#73

@cgohlke
Copy link

cgohlke commented Jan 7, 2022

FWIW, the imagecodecs library includes numcodecs compatible codecs. Register with imagecodecs.numcodecs.register_codecs(). It's all work in progress but good enough to experiment with.

@jacobbieker
Copy link
Member

Using the jpeg2k, saving 3 channels as individual timesteps saves about 57%, when using level=100 compared to the zstd, the jpeg2k throws an error when encoding saving multiple timesteps though.

@jacobbieker
Copy link
Member

Using bz2, which is what pbzip2 is based on, reduces the zarrs by 23%, while being lossless, and not needing special handling

@jacobbieker
Copy link
Member

Higher levels for zstd didn't result in any real savings

@jacobbieker
Copy link
Member

#47 might also be an easy win on compression. Tried AVIF but it doesn't do monochrome images, will try with 3 channel images soon.

@jacobbieker
Copy link
Member

Fixing the dtype saves another 13% on the size when using bz2

@jacobbieker
Copy link
Member

Fixing the dtype and using bz2 level 3 or 5 results in a 28% reduction in size compared to zstd

Repository owner moved this from Todo to Done in NIA: WP2 Jan 19, 2022
@kasiaocf kasiaocf removed this from NIA: WP2 Jan 20, 2022
@kasiaocf kasiaocf moved this to Todo in Nowcasting Jan 20, 2022
@kasiaocf kasiaocf moved this from Todo to Done in Nowcasting Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants