PDF to Digital Form using GPT4 Vision API

A POC that uses GPT 4 Vision API to generate a digital form from an Image using JSON Forms from https://jsonforms.io/

💭 Inspired by:

Both repositories demonstrate that the GPT4 Vision API can be used to generate a UI from an image and can recognize the patterns and structure of the layout provided in the image.

Image generated by DALL-E 3.

Demo 🤓

Click the thumbnail to watch on YouTube:

Try it on my GitHub Page 🚀

https://nathanfhh.github.io/Digital-Form-with-GPT4-Vision-API/

I am using pdf.js to process the PDF file and request to OpenAI's API to generate the response entirely in the browser.

Running using Local Environment 💻

Frontend

cd into frontend directory

cd ai-json-form

Install Packages and run

npm install
npm run dev

Backend

cd into directory

cd backend

Install Packages

poetry install
# alternatively, you can use pip install
pip install -r requirements.txt

Setup Environment Variables

export OPENAI_API_KEY=
# optional
export OPENAI_ORG=

If you plan to use the Mock response only, you should set OPENAI_API_KEY to any value.

Run

python main.py

Running using Docker 🐳

export the environment variables

echo "OPENAI_API_KEY=YOUR_API_KEY" > .env
# The following is optional
echo "OPENAI_ORG=YOUR_ORG" >> .env

Run the docker-compose

docker-compose up --build

Open the browser and visit http://localhost:8080/aijsv/

Disclaimer

I am new to Vue, so the code might not be the best practice. I am still learning and improving. Should you have any suggestions, please feel free to PR.

Flow Explain

Upload PDF files of up to three pages from the frontend

If you want to adjust the number of pages, you can change the MAX_PDF_PAGES variable in backend/app/socket.py
When the backend receives the PDF file in Base64 string format, it does the following processes:
- Convert the URL String Back to Bytes
- Read the PDF file, convert it to a JPG image, and save it to the /tmp folder using the package pdf2image.
- Extract the strings from the same PDF file using the package PyPDF2. The extracted strings will become part of the prompt sent to the GPT4 model to enhance accuracy.
- Prepare the prompts and send them along with the PDF screenshot to the GPT4 Vision API
- Send the chunk to the frontend via Socket.IO incrementally.
Whenever the frontend receives the chunk, it appends it to the codemirror editor, and checks if the current content is a valid YAML. If it's a valid YAML, it will apply it to the JSON Scheme to force the UI to re-render.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
ai-json-form		ai-json-form
backend		backend
.gitignore		.gitignore
README.md		README.md
ai-json-forms-demo.mp4		ai-json-forms-demo.mp4
demo-file-黃磷.pdf		demo-file-黃磷.pdf
docker-compose.yml		docker-compose.yml
shibaInu.jpeg		shibaInu.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Digital Form using GPT4 Vision API

Demo 🤓

Try it on my GitHub Page 🚀

Running using Local Environment 💻

Frontend

Backend

Running using Docker 🐳

Disclaimer

Flow Explain

About

Releases

Packages

Contributors 2

Languages

nathanfhh/Digital-Form-with-GPT4-Vision-API

Folders and files

Latest commit

History

Repository files navigation

PDF to Digital Form using GPT4 Vision API

Demo 🤓

Try it on my GitHub Page 🚀

Running using Local Environment 💻

Frontend

Backend

Running using Docker 🐳

Disclaimer

Flow Explain

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages