-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF Read/Merge generates broken documents since update 2.10.6 #1344
Comments
Thanks to locating the issue, |
The PDF has some fonts where the name includes some spaces (encoded with #20). The decoding takes that properly but not the writing leaving some invalides spaces leading to invalid fields. else fix in progress |
@Merinorus |
Hi @pubpub-zz, |
Three kinds of changes were made in this PR 1) _cmap.py : the str is coming from `/Encoding` which stores a NameObject : The conversion is already performed; no need to force it. 2) _page.py : Replaced obsolete call in _debug_for_extract() 3) _base.py : 3.1) unnumber : all `#xx` should be performed prior to conversion to str (using utf-8) to allow multi language text 3.2) read_from_stream : if utf-8 (normally the only one required) or gbk (kept to prevent regression) we will use charmap to get some sequence of chars 3.3) renumber : added to recode in #xx sequence. renumber will also be compatible with utf-8 chars Closes #1344
The fixing PR was just merged. I will make a release to PyPI in a few minutes. |
@Merinorus Thank you for reporting the issue! We value issue reports and feedback if the fixes worked. If you want, I would add you as a contributor: https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html |
@MartinThoma That's very kind! I accept your offer. ;) |
Hello,
Since update 2.10.6, some PDF documents are not merged correctly. Same with version 2.10.7.
Previous versions (2.10.5 and below) behave correctly.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-Ubuntu-20.04-focal Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-debian-11.2 $ python3 -c "import PyPDF2;print(PyPDF2.__version__)" 2.10.6
Code + PDF
This is a minimal, complete example that shows the issue:
Here is the PDF that caused the issue:
input.pdf
Here is the output (simple PdfReader -> PdfMerger):
output.pdf
Traceback
This is the complete Traceback I see:
Thank you for taking the time to investigate.
The document is a French official form, I guess it's fine for using it in automated tests, but not sure.
The text was updated successfully, but these errors were encountered: