Skip to content

Commit

Permalink
Multiple pdf file correction
Browse files Browse the repository at this point in the history
  • Loading branch information
EnodyG authored Jul 12, 2024
1 parent 83fe45d commit 6348126
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions utilities/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ def convert_pdf_to_text(file):
images = convert_from_bytes(file)
else:
images = convert_from_path(file)
for i,img in enumerate(images):
extraction = (pytesseract.image_to_string(img)[:-1])
return extraction
extraction = []
for img in images:
text = pytesseract.image_to_string(img)
extraction.append(text)

return " ".join(extraction)

0 comments on commit 6348126

Please sign in to comment.