Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Include a CI check that ensures Oppia's images are fully compressed. #21274

Open
seanlip opened this issue Nov 17, 2024 · 41 comments
Labels
enhancement Label to indicate an issue is a feature/improvement Impact: High Blocks or significantly slows down a core workflow. Work: Medium The means to find the solution is clear, but it isn't at good-first-issue level yet.

Comments

@seanlip
Copy link
Member

seanlip commented Nov 17, 2024

Is your feature request related to a problem? Please describe.

We have encountered issues in the past where, due to large image files being added to the Oppia repository, we are unable to deploy the code because the build is too large. On further examination, these issues occurred due to large binary images included in the code, and compressing the images fixed the issue.

In order for this to not recur, we would like to implement a CI check that fails and blocks a PR when images in the PR are not properly optimized.

Describe the solution (or solutions) you'd like

Implement a CI check that either scans all images in a PR (or scans all images in the codebase), and compresses them using a tool like trimage.

The check should print out the following data about each image: filename; original size; compressed size.

If the size of any image in the PR/codebase exceeds the compressed image's size by 1% or more, fail the CI check and note the image that failed. The error message should provide instructions to the developer about how to run trimage on the problematic images and replace them in their PR (or better still, upload the compressed images as an artifact so that the developer can download and replace them directly).

Describe alternatives you've considered and rejected

The 1% allowance is optional but gives a bit of leeway. It's fine to make that percentage smaller, but it probably shouldn't be 0% unless trimage behaves fully deterministically (I haven't investigated this though).

Additional context

No response

@seanlip seanlip added triage needed enhancement Label to indicate an issue is a feature/improvement and removed triage needed labels Nov 17, 2024
@seanlip seanlip added Work: Medium The means to find the solution is clear, but it isn't at good-first-issue level yet. Impact: High Blocks or significantly slows down a core workflow. labels Nov 18, 2024
@souja450
Copy link

I want to fix it.

@seanlip
Copy link
Member Author

seanlip commented Nov 19, 2024

@souja450 Per the guidance at https://github.com/oppia/oppia/wiki/Contributing-code-to-Oppia#choosing-a-good-first-issue, please provide an explanation of what your PR will do (with names of files you're changing, what you plan to change in each file, etc.). If it looks good, we can assign you to this issue.

Please also follow the other instructions on that wiki page if you have not yet done so. Thanks!

@Ashu463
Copy link

Ashu463 commented Nov 19, 2024

Hey @seanlip attached a PR, could you please review it once ? One more thing this PR isn't fix auto compression of image which I'll add soon.

@seanlip
Copy link
Member Author

seanlip commented Nov 19, 2024

@Ashu463 Let me know when you have a complete solution, and I'll review it then. Thanks.

@Ashu463
Copy link

Ashu463 commented Nov 19, 2024

I have fixed this in this PR. Have a look there.

@Ashu463
Copy link

Ashu463 commented Nov 19, 2024

Hey @seanlip here is solution which I thought to apply ->
Proposed solution is:

  1. To make a python script wihch takes care of all the logics of checking size of image and return responses accordingly.
  2. Using this script inside the CI check, which will further run thorugh docker-compose .
    Following the same architecture as used in oppia itself in many places e.g. lint, python type check, etc.

@seanlip
Copy link
Member Author

seanlip commented Nov 20, 2024

@Ashu463 I took a look, thanks. I think your general approach is fine but the image compression check should use the same mechanism as the solution you are recommending (trimage in this case), otherwise there's a risk that contributors will use that tool and still fail the check. Can you make the necessary updates?

@Ashu463
Copy link

Ashu463 commented Nov 21, 2024

@seanlip I am just confused what you are trying to say but what I get is you want trimage to compress the exceeded images on which I'm working right now. The intial solution was made to just show my interest over the issue.

@Ashu463
Copy link

Ashu463 commented Dec 5, 2024

Hey @seanlip is there any other tool available for compresing, as trimage causes an unknown error whose solution I hadn't found anywhere ?
Screenshot from 2024-12-05 21-26-06

@seanlip
Copy link
Member Author

seanlip commented Dec 5, 2024

Have you tried searching for the error message? I did a search and got this as the very first result: https://stackoverflow.com/questions/69994530/qt-qpa-plugin-could-not-find-the-qt-platform-plugin-wayland

@Ashu463
Copy link

Ashu463 commented Dec 5, 2024

Yeah I do try those but got another error as QSocketNotifier: Can only be used with threads started with QThread, whose available solutions are not working even. I think it'll take time to debug, will inform you if issue persists.
Also Is there any necessity of using trimage ? Why couldn't we work any other alternative ?

@Ashu463
Copy link

Ashu463 commented Dec 6, 2024

Hey @seanlip with my research trimage uses a library GLIBC and trimage is not maintained properly to be compatible with GLIBC 2.39 latest version, and thus there were very less answers to help out there.

@seanlip
Copy link
Member Author

seanlip commented Dec 9, 2024

Thanks for taking a look, @Ashu463. Yes, it looks like trimage hasn't been updated for 3 years or so.

It's fine to pick a different library, we should just make sure it compresses images in a lossless manner. Feel free to post your findings here.

@Ashu463
Copy link

Ashu463 commented Dec 12, 2024

Hey @seanlip sorry for the delay in reply but now my exams are over and will share updates soon.
Thanks

@Ashu463
Copy link

Ashu463 commented Dec 14, 2024

Hey @seanlip I will recommend to use imagemagick as it is very similar to trimage and supports almost 200+ image extensions, also this works fine with the script.
Also I have a doubt i.e. how much should the size of compressed image be ? I mean while checking the size how much should I keep the criteria, coz if it is 1% of current size then in every iteration of running script wouldn't image size gets reduced every time ?

@seanlip
Copy link
Member Author

seanlip commented Dec 16, 2024

@Ashu463 I don't have an issue with imagemagick if it works fine.

In terms of the size check, we should compress losslessly as much as possible. The idea then is that if the same compression is run on any existing image in Oppia then it would compress to that same final size (the 1% is just an error tolerance). This way, we can detect whether an image hasn't been run through the compression pipeline yet. Does that make sense?

@Ashu463
Copy link

Ashu463 commented Dec 16, 2024

@seanlip got it, will share you a demo soon.
Thanks !

@Ashu463
Copy link

Ashu463 commented Dec 17, 2024

Hey @seanlip here is the demo, and the approach is :

  • Firstly I am compressing image with the help of in built Python Image Library (PIL).
  • Then I compare compressed image and original img to check their 1% tolerance.
  • If compressed img size is still bigger than original img then I compress it using ImageMagick.
    In the given video medium.webp is the perfect example of compression with the help of ImageMagick. Here is the video link if video gets stuck somewhere.
Screencast.from.2024-12-17.21-44-02.mp4

@seanlip
Copy link
Member Author

seanlip commented Dec 18, 2024

@Ashu463 Can you link to documentation that confirms that the compression you're doing with both PIL and ImageMagick is lossless? I need to make sure that this is the case so that the compression doesn't result in sub-quality images. Thanks.

@Ashu463
Copy link

Ashu463 commented Dec 18, 2024

Hey @seanlip there is no such official documentary where ImageMagick claimed to compress images losslessly, but here is link of some chat or FAQ opened by them officially where someone had discussed about its lossless nature.
Also PIL couldn't be continued as it reduces image quality, which I didn't researched while using it earlier for which I am apologized.

@seanlip
Copy link
Member Author

seanlip commented Dec 19, 2024

@Ashu463 Thanks. I think more research is needed. We need authoritative documentation that the methods you are using for this are lossless.

@Ashu463
Copy link

Ashu463 commented Dec 19, 2024

@seanlip fine, will update you soon with much legit ones.
Thanks !

@Ashu463
Copy link

Ashu463 commented Dec 22, 2024

Hey @seanlip following are my some observation while searching :

  • I get official documentation where they provide a lot of info about all supported extension out of which .jpeg, .png, .webp, is useful only. You can get to read about them Supported Image Formats section of the above provided link.
  • Here are my observations about their lossless nature of compression, as you can see the attached screenshots where lines about lossless nature for various extension is present. Here is it's link where you can get the same what I shared just by searching keyword lossless over whole page.

Screenshot from 2024-12-22 20-49-56
Screenshot from 2024-12-22 20-49-42

@seanlip
Copy link
Member Author

seanlip commented Dec 24, 2024

Thanks @Ashu463. So how should we proceed? Can you compare the performance here to e.g. trimage or other libraries so that we can compare the different options and make an informed decision?

@Ashu463
Copy link

Ashu463 commented Dec 24, 2024

@seanlip Yeah sure I could compare it with various tools but how could I show that to you, attach directly ss or something else ?

@Ashu463
Copy link

Ashu463 commented Dec 25, 2024

Hey @seanlip here is I think you need, I am not attaching any documentation link or ss but if you need then pls let me know once I'll attach it. Following features are the advantages or the reasons why should we choose ImageMagick over other tools:

  • Lossless Compression : It will only remove metadata within the image rather than ruining its actual quality, which is our primary aim.
  • Vast Extension : It can support most used image extension, moreover it support extensions which are used in oppia itself (.jpg, .jpeg, .png, .webp)

But ImageMagick do have some loopholes listed as :

  • ImageMagick is known to consume a lot of memory and CPU resources for large files or batch processing, due to which it is quite time consuming while doing compressions.
  • While ImageMagick supports multiple formats, its optimization for specific formats is not so advanced, but it would work for us cause we do need heavy compression.

With these pros and cons, I would conclude that Imagemagick definitely have some issues but it can be totally fine to use it as cons could be neglected over its pros.

@seanlip
Copy link
Member Author

seanlip commented Dec 26, 2024

@Ashu463 Are there no other tools to consider besides ImageMagick? It would be good to see a list of tools and their pros/cons, with links to supporting documentation if possible.

I am not really sure about ImageMagick's "removing metadata only" issue. Usually we want to compress the actual image (not just its metadata) without reducing its quality.

@Ashu463
Copy link

Ashu463 commented Dec 26, 2024

Hey @seanlip I am sharing a doc as I thought it would get too messy if I explain comparisons here. DOC LINK
Thanks !

@Ashu463
Copy link

Ashu463 commented Dec 29, 2024

Hey @seanlip had you gone through it ?

@seanlip
Copy link
Member Author

seanlip commented Dec 29, 2024

Yes, I've just taken a look, thanks for the ping.

Note that you only need to consider JPEG, PNG and WEBP (don't worry about SVG). I think that including a table that demonstrates, for each (tool, filetype) pair, the (a) compression percentage and the (b) time needed for compression would be helpful (and that links to the resulting images). You can try using representative images that are already in the Oppia codebase.

Also, I think it's OK to use different libraries for different file types if needed. Just note that all compression must be lossless.

Could you please update the doc accordingly and ping this thread when you'd like a follow-up review? Thanks.

@Ashu463
Copy link

Ashu463 commented Dec 29, 2024

Yeah sure, will update here soon as I am busy in issue#21294.
Thanks !

@Ashu463
Copy link

Ashu463 commented Dec 31, 2024

Hey @seanlip it is quite impossible to collect exact data for compression percentage and time needed from official documentation, as these compression tools docs never talks about these thing. For these data we have to rely on some chats said somewhere which can't be declared official. Moreover Compression percentage could be find with the help of two sizes of image before and after the compression. And same for the time, so what should I do for making that table ?

@seanlip
Copy link
Member Author

seanlip commented Dec 31, 2024

Just run the pipelines on some sample images and record that data.

@Ashu463
Copy link

Ashu463 commented Dec 31, 2024

Hey could you also help me here #21568 ?

@Ashu463
Copy link

Ashu463 commented Dec 31, 2024

Hey @seanlip I'd updated the doc with table consist of tool name, compression %age, and time while compression.
PTAL Thanks.

@seanlip
Copy link
Member Author

seanlip commented Jan 2, 2025

@Ashu463 Thanks. Please provide "anyone can comment" access to your doc.

Could you please also provide links, from the doc, to the initial and final images in each case, so that we can check the quality? I would also suggest you use a small and a large image for each case so that we can see how each of those is handled. Try to use images from the Oppia website/codebase where possible -- thanks.

@Ashu463
Copy link

Ashu463 commented Jan 2, 2025

Hey @seanlip I'd updated doc as whatever you said, picked one small and one large image of .JPEG and .PNG extensions and detailed about them. Also there will be suprise for .webp extension. Moreover I attached original image and compressed image by various tools over doc. Note that all the image I used is from oppia's repository itself.
PTAL thanks !

@Ashu463
Copy link

Ashu463 commented Jan 4, 2025

Hey @seanlip had you gone through doc ?

@seanlip
Copy link
Member Author

seanlip commented Jan 5, 2025

@Ashu463 Sorry for the late reply, I'm getting a lot of emails each day and it's hard to respond to them all. Thanks for the ping to look at this again.

Based on your analysis, I think GraphicMagick is probably the way to go since the runtime is faster than optipng and the resulting images look reasonable. Does this give you enough information to proceed? We can do the compression for all jpg, png and webp images (not svg).

Thanks!

@Ashu463
Copy link

Ashu463 commented Jan 5, 2025

It's all fine @seanlip, Yeah I can proceed with GraphicMagick now for all except svg, Now it's the time to show you some demo which I will share soon. Please assign this to me now so that I can open PR as well.
Thanks !

@seanlip
Copy link
Member Author

seanlip commented Jan 5, 2025

@Ashu463 You'd need to show proof that you have it working, similar to this, before we can assign you. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Label to indicate an issue is a feature/improvement Impact: High Blocks or significantly slows down a core workflow. Work: Medium The means to find the solution is clear, but it isn't at good-first-issue level yet.
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

3 participants