-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OxiPNG Zopfli compression seems to be overly slow compared to zopflipng's #414
Comments
I've just realized that the zopfli repo has this related issue, so I'd say this low performance is reproducible, and it basically stayed the same during these years: carols10cents/zopfli#37. However, these findings seem to indicate that zopflipng has to use the Zopfli compression routine in a different way, because otherwise I see no reasonable explanation for these differences. |
Considering neither carols10cents/zopfli nor dfrankland/zopfli-rs have been updated since 2018, and that zopfli-rs appears to perform slightly better than zopfli, it might be worth switching from zopfli to zopfli-rs. It wouldn't fix this issue, but it seems like it should give a slight performance improvement. |
* Update and optimize dependencies These changes update the dependencies to their latest versions, fixing some known issues that prevented doing so in the first place. In addition, the direct dependency on byteorder was dropped in favor of stdlib functions that have been stabilized for some time in Rust, and the transitive dependency on chrono, pulled by stderrlog, was also dropped, which had been affected by security issues and improperly maintained in the past: - cardoe/stderrlog-rs#31 - https://www.reddit.com/r/rust/comments/ts84n4/chrono_or_time_03/ * Run rustfmt * Bump MSRV to 1.56.1 Updating to this patch version should not be cumbersome for end-users, and it is required by a transitive dependency. * Bump MSRV to 1.57.0 os_str_bytes requires it. * Add initial support for changing Zopfli iterations PR #445 did some dependency updates, which included using the latest zopfli version. The latest version of this crate exposes new options in its API that allow users to choose the desired number of Zopfli compression iterations, which may greatly affect execution time. In fact, other optimizers such as zopflipng dynamically select this number depending on the input file size (see: #414). As a first step towards making OxiPNG deal with Zopfli better, let's add the necessary options for libraries to be able to choose the number of iterations. This number is still fixed to 15 as before when using the CLI. * Fix Clippy lint Co-authored-by: Josh Holmer <jholmer.in@gmail.com>
I recently switched from OptiPNG to OxiPNG (primary using it within GreenShot). I did some benchmarking using OptiPNG and OxiPNG while comparing my old and new CPU. I switched from an Intel i7 7700k (4 cores with SMT) to an AMD Ryzen 5800X3D (8 cores with SMT) some days ago. While I get around 9% more speed with my Ryzen using OptiPNG (which is expected, because it only use one core), I only got 10% more speed with my Ryzen using OxiPNG (which is more or less only the faster single core performance of the new Ryzen). So I guess OxiPNG is only using up to 4 cores. Well, for me this is still a big step forward, because OxiPNG is around 5-times faster here than OptiPNG. But in fact I was expecting that OxiPNG will be around ~120% faster on my new Ryzen CPU, and not only 10%. So my guess: zopflipng is twice as fast than OxiPNG, because it use up to 8-cores? It would be interesting to know which CPU @AlexTMjugador used to create the time measures in the first post, maybe he is using an 8-cores CPU too. Maybe there is some potential to split the workload of OxiPNG into more threads? |
I took those numbers on an old Intel Core i3-2100 CPU clocked at 3,2 GHz (BCLK overclock), which has two cores with SMT. About the threading speedup, If I recall correctly, OxiPNG spawns one thread per optimization strategy it tries, and the set of possible strategies is fixed depending on the image and options. Therefore, more hardware threads will only help up to the point where there are no more strategies to try in parallel. If there are more hardware threads than strategies, you won't get any speedup. Also, as I stated before, the vanilla zopflipng does not use several threads at all (maybe OptiPNG does, however), so I don't think that threading explains the performance difference here. |
Oh, okay. Today I tried to use the same OxiPNG arguments as you on my system. I'm a little bit surprised about the -Z argument. It takes around 4 minutes on my system and produce much bigger PNG files than with the default OxiPNG settings, which just needs 2,8 seconds on my system. Why I should use the -Z argument, if it takes 85-times longer to produce bigger PNG files? What is the advantage of -Z? |
The However, OxiPNG tries less optimization strategies when Zopfli compression is enabled to compensate for the increased compression cost. It looks like your PNGs are more effectively optimized by trying several strategies with a not-so-extreme compressor than by trying fewer strategies with a better compressor 😉 |
In my test case I used a PNG file with 8.682.324 Bytes. The -Z argument compressed that image down to 8.325.841 Bytes. Without the -Z argument, OxiPNG compressed that file to 6.348.630 Bytes, 85-times faster. So I stay without the -Z argument. ;) |
Would you mind sharing that image? Maybe it's useful to investigate if something is going wrong with Zopfli here 😄 |
Well, in fact it is a desktop screenshot (3 displays, 6400x1440 pixels) with an background picture and desktop icons, created with Greenshot. I would share it in private with an developer, but not in public here. |
There's a new zopfli patch version with some performance improvements. Could someone provide a PR, please? I'm not familiar with rust myself, but I started using oxipng and noticed the slowness myself too. Thanks! |
The performance improvements of the new But sure, I could open a PR to update the Zopfli version that OxiPNG locks in its |
I don't mind if there are no gains :) I thought there was a small improvement already, that's why I asked for it. I'm glad to hear you have the issue under your radar, and just ignore my comment above :) |
Patch with updated and tested crates (libdeflater, zopfli, clap) is #495 |
Big thanks to @AlexTMjugador for the latest zopfli updates! Have you tried running that same |
I didn't try that yet, but I could give it a shot in a few days or so 😄 Thanks for the reminder by the way! |
I tried to replicate the experiment described in my original issue comment as closely as possible, but I had to use a different input image and run the commands on a much faster computer, so the exact performance and size reduction figures are not comparable. Nevertheless, I think that the results are interesting, so I'm sharing them.
time ./oxipng_latest_86fccf0 -v -t1 -f0 -Z --out /dev/null input.png This binary was generated with
time ./oxipng_v8.0.0 -v -t1 -f0 -Z --out /dev/null input.png This binary was generated with
time ./oxipng_v5.0.0 -v -t1 -f0 -Z --out /dev/null input.png This binary was generated with
time zopflipng --iterations=15 --filters=0 -y input.png /dev/null The
In light of these results, my conclusions are:
|
That is... surprising 😯 |
When I run the result of
So |
Hm, twice as fast. Good luck with the profiling, it would be amazing if you could close that gap 😁 |
While using OxiPNG with the Zopfli compression mode, I noticed that some images took an unusually long time to compress. Of course, this is somewhat expected due to the usage of Zopfli compression, which is slow by design. However, some quick benchmarks against zopflipng, with the most similar compression settings possible, showed that zopflipng is both faster and more effective at optimizing images than OxiPNG with Zopfli compression, which is an interesting result.
In particular, to try to get an apples to apples comparison as much as possible, I fixed the following parameters:
Of course, I have always used the same unprocessed image,
input.png
.The results were as follows:
$ time target/release/oxipng -v -t1 -f0 -Z input.png Processing: input.png 2048x2048 pixels, PNG format 4x8 bits/pixel, RGBA IDAT size = 419799 bytes File size = 422604 bytes Trying: 1 combinations zc = 0 zs = 0 f = 0 399505 bytes Found better combination: zc = 0 zs = 0 f = 0 399505 bytes IDAT size = 399505 bytes (20294 bytes decrease) file size = 402310 bytes (20294 bytes = 4.80% decrease) Output: input.png target/release/oxipng -v -t1 -f0 -Z input.png output.png 513,45s user 0,16s system 99% cpu 8:34,03 total
$ time zopflipng --iterations=15 --filters=0 -y input.png output.png Optimizing input.png Input size: 422604 (412K) Result size: 399568 (390K). Percentage of original: 94.549% Result is smaller zopflipng --iterations=15 --filters=0 -y input.png output.png 252,55s user 0,09s system 99% cpu 4:13,16 total
These results show that, for the same input image, zopflipng was 2 times faster than OxiPNG, while also managing to compress the image a bit more.
I believe that these results are interesting, because some quick
println!
debugging showed that OxiPNG expends most of its execution time in this function; more exactly, in thezopfli::compress
call:oxipng/src/deflate/mod.rs
Lines 52 to 63 in 8053211
And both PNG optimization programs use Zopfli for compression, with the zopfli crate being a straightforward translation of the original Zopfli C code to Rust, which should have similar performance (and more quick tests show that the zopfli crate binary has similar performance to compress files to the upstream Zopfli binary). So the compression algorithm itself seems to not be to blame, neither its implementation in Rust, but for some reason OxiPNG still is much more slower than zopflipng, while both programs should end up compressing similar amounts of pixel data.
Has anyone else managed to reproduce this performance difference? What might be causing it?
The text was updated successfully, but these errors were encountered: