Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1271 added LZWDecode compression #1286

Merged

Conversation

opposss
Copy link

@opposss opposss commented Oct 16, 2024

Fixes #1271

Added support for image compression using LZWDecode.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@opposss opposss requested a review from gmischler as a code owner October 16, 2024 00:13
@opposss
Copy link
Author

opposss commented Oct 16, 2024

Hi @Lucas-C I added support for compression using LZWDecode, and I would like you to look at my code and give your opinion. I tested this filter manually by adding images compressed with LZWDecode to pdf files, and it seems to work correctly. Local check of pylint and black showed no errors. After the review, if the code is ok, I could write unit tests.

@Lucas-C
Copy link
Member

Lucas-C commented Oct 16, 2024

Hi @Lucas-C I added support for compression using LZWDecode, and I would like you to look at my code and give your opinion. I tested this filter manually by adding images compressed with LZWDecode to pdf files, and it seems to work correctly. Local check of pylint and black showed no errors. After the review, if the code is ok, I could write unit tests.

Good job 👍

I'll try to review this PR very soon, today or tomorrow.

fpdf/image_parsing.py Outdated Show resolved Hide resolved
fpdf/image_parsing.py Outdated Show resolved Hide resolved
fpdf/image_parsing.py Outdated Show resolved Hide resolved
Copy link
Member

@Lucas-C Lucas-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @opposss

God job overall 👍

You placed the code at the right place, and it's clear.

I'll finish the code review once unit tests have been added, but it's a promising start!

fpdf/image_parsing.py Outdated Show resolved Hide resolved
@opposss opposss requested a review from Lucas-C October 19, 2024 15:23
@Lucas-C
Copy link
Member

Lucas-C commented Oct 22, 2024

OK so there is one thing currently blocking in the GItHub Actions pipeline:

6.1.10-1 (1): test/image/image_types/image_types_insert_jpg_lzwdecode.pdf

This is the relevant VeraPDF rule: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Part-1-rules/#rule-6110-1

I think the best fix is to add this rule (6.1.10-1) into verapdf-ignore.json, with reason: fpdf2 wants to support LZWDecode filter`

@Lucas-C Lucas-C merged commit 8d7cbf1 into py-pdf:master Oct 23, 2024
11 checks passed
@Lucas-C
Copy link
Member

Lucas-C commented Oct 23, 2024

Thank you for your contribution @opposss 👍

@allcontributors please add @opposss for code

Copy link

@Lucas-C

I've put up a pull request to add @opposss! 🎉

@Lucas-C
Copy link
Member

Lucas-C commented Nov 21, 2024

I noticed today that the unit test test_insert_jpg_lzwdecode is quite slow to execute: ~78s on my computer.

And 90% of this execution is spent in pack_codes_into_bytes() based on this quick test:

pip install pytest-profiling
pytest test/image/image_types/test_insert_images.py -k lzwdecode --profile

I wonder if this could be improved...

@opposss
Copy link
Author

opposss commented Nov 21, 2024

@Lucas-C
Apparently there is a problem with encoding large amounts of data in the case of JPEG, in case of PNG and other formats everything seems to work fine.

I changed the implementation pack_codes_into_bytes() a bit, and so far I've only managed to reduce the time from 80s to ~40s.
I will try to improve the performance even more in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for LZWDecode compression
2 participants