gptpdf

Using VLLM (like GPT-4o) to parse PDF into markdown.

Our method can almost perfectly parse typesetting, mathematical formulas, tables, pictures, charts, etc.

Average price per page: $0.013

This package use GeneralAgent lib to interact with OpenAI API.

Process steps

Use the PyMuPDF library to parse the PDF and extract all non-text areas.
Convert all non-text areas on the PDF into images and number them
Mark the non-text areas and numbers on each page of the PDF and save them as images, similar to the following:

Based on the image in step 3, use a large visual model (such as GPT-4o) to parse and obtain the markdown content.

DEMO

See examples/attention_is_all_you_need/output.md for PDF examples/attention_is_all_you_need.pdf.

Installation

pip install gptpdf

Usage

from gptpdf import parse_pdf
api_key = 'Your OpenAI API Key'
content, image_paths = parse_pdf(pdf_path, api_key=api_key)
print(content)

See more in test/test.py

API

parse_pdf(pdf_path, output_dir='./', api_key=None, base_url=None, model='gpt-4o', verbose=False)

parse pdf file to markdown file, and return markdown content and all image paths.

pdf_path: pdf file path

output_dir: output directory. store all images and markdown file

api_key: OpenAI API Key (optional). If not provided, Use OPENAI_API_KEY environment variable.

base_url: OpenAI Base URL. (optional). If not provided, Use OPENAI_BASE_URL environment variable.

model: OpenAI Vison LLM Model, default is 'gpt-4o'. You also can use qwen-vl-max

verbose: verbose mode

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
examples		examples
gptpdf		gptpdf
test		test
.gitignore		.gitignore
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gptpdf

Process steps

DEMO

Installation

Usage

API

About

Releases

Packages

Contributors 8

Languages

License

CosmosShadow/gptpdf

Folders and files

Latest commit

History

Repository files navigation

gptpdf

Process steps

DEMO

Installation

Usage

API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages