Extract tables from scanned image PDFs using Optical Character Recognition.
-
Updated
Jun 9, 2020 - Python
Extract tables from scanned image PDFs using Optical Character Recognition.
BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.
This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.
Debian packaging of pdfbeads
A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
Growing collection of scripts that manipulate text data.
Add a description, image, and links to the scanned-image-pdfs topic page so that developers can more easily learn about it.
To associate your repository with the scanned-image-pdfs topic, visit your repo's landing page and select "manage topics."