Closed
Description
I'm writing a tool to collate pages from several PDFs. I found it incredibly easy to crash my program because the bindings do not model the lifetimes of references in QPDF very well. For example, using QPdf::read_from_memory
requires that the buffer outlives the QPdf
object, but that is not enforced using Rust's type system:
use qpdf::*;
fn main() {
let usage = "crash <PDF>";
let buf = std::fs::read(std::env::args().nth(1).expect(usage)).unwrap();
// Works fine:
let source = QPdf::read_from_memory(&buf).unwrap();
let sink = QPdf::empty();
sink.add_page(&source.get_pages().unwrap()[0], false).unwrap();
// Segfaults:
let source = QPdf::read_from_memory(buf).unwrap();
let sink = QPdf::empty();
sink.add_page(&source.get_pages().unwrap()[0], false).unwrap();
}
Further, I have found this problem while writing out the resulting PDF. QPDF's documentation mentions this (https://qpdf.readthedocs.io/en/latest/design.html#copying-objects-from-other-pdf-files), but even if I explicitly use this, I still need to keep the source buffer around, which again the Rust type system does not enforce:
use qpdf::*;
fn main() {
let usage = "crash <PDF>";
let buf = std::fs::read(std::env::args().nth(1).expect(usage)).unwrap();
let source = QPdf::read_from_memory(&buf).unwrap();
let sink = QPdf::empty();
// I understand this to make a deep copy.
let page = sink.copy_from_foreign(&source.get_pages().unwrap()[0]);
// So we should be able to drop source here, and we can.
drop(source);
// But if we drop buf, we will segfault...
drop(buf);
sink.add_page(page, false).unwrap();
sink.writer()
.static_id(true)
.force_pdf_version("1.7")
.normalize_content(true)
.preserve_unreferenced_objects(false)
.object_stream_mode(ObjectStreamMode::Preserve)
.compress_streams(false)
.stream_data_mode(StreamDataMode::Preserve)
// ... while writing the PDF here:
.write_to_memory().unwrap();
}
Metadata
Metadata
Assignees
Labels
No labels
Activity
ancwrd1 commentedon Jan 21, 2023
Nice catch, thanks! It's actually not mentioned in QPDF C bindings that the qpdf_read_memory expects the buffer to not be freed.
I think the best option here is to use the same lifetime for QPdf and the buffer, otherwise we need to make a copy and waste a memory.
ancwrd1 commentedon Jan 21, 2023
On a second thought making the buffer owned is probably a much better option in terms of ergonomics and usage.
ancwrd1 commentedon Jan 21, 2023
Regarding foreign objects: I think it's probably a bug in QPDF, I'll check it further and open an issue there if needed.
Proper fix for #2
ancwrd1 commentedon Jan 21, 2023
Should be fixed in bb9dff0
Will publish version 0.1.5
teythoon commentedon Jan 23, 2023
Sweet, thanks for the quick fixes!