Skip to content

nathanfhh/Digital-Form-with-GPT4-Vision-API

Repository files navigation

PDF to Digital Form using GPT4 Vision API

A POC that uses GPT 4 Vision API to generate a digital form from an Image using JSON Forms from https://jsonforms.io/

💭 Inspired by:

  1. screenshot-to-codehttps://github.com/abi/screenshot-to-code
  2. draw-a-uihttps://github.com/SawyerHood/draw-a-ui

Both repositories demonstrate that the GPT4 Vision API can be used to generate a UI from an image and can recognize the patterns and structure of the layout provided in the image.

figure

Image generated by DALL-E 3.

Demo 🤓

Click the thumbnail to watch on YouTube: Watch the Demo Video

Try it on my GitHub Page 🚀

https://nathanfhh.github.io/Digital-Form-with-GPT4-Vision-API/

I am using pdf.js to process the PDF file and request to OpenAI's API to generate the response entirely in the browser.

Running using Local Environment 💻

Frontend

  1. cd into frontend directory
cd ai-json-form
  1. Install Packages and run
npm install
npm run dev

Backend

  1. cd into directory
cd backend
  1. Install Packages
poetry install
# alternatively, you can use pip install
pip install -r requirements.txt
  1. Setup Environment Variables
export OPENAI_API_KEY=
# optional
export OPENAI_ORG=

If you plan to use the Mock response only, you should set OPENAI_API_KEY to any value.

  1. Run
python main.py

Running using Docker 🐳

  1. export the environment variables
echo "OPENAI_API_KEY=YOUR_API_KEY" > .env
# The following is optional
echo "OPENAI_ORG=YOUR_ORG" >> .env
  1. Run the docker-compose
docker-compose up --build
  1. Open the browser and visit http://localhost:8080/aijsv/

Disclaimer

I am new to Vue, so the code might not be the best practice. I am still learning and improving. Should you have any suggestions, please feel free to PR.

Flow Explain

  1. Upload PDF files of up to three pages from the frontend

    If you want to adjust the number of pages, you can change the MAX_PDF_PAGES variable in backend/app/socket.py

  2. When the backend receives the PDF file in Base64 string format, it does the following processes:

    • Convert the URL String Back to Bytes
    • Read the PDF file, convert it to a JPG image, and save it to the /tmp folder using the package pdf2image.
    • Extract the strings from the same PDF file using the package PyPDF2. The extracted strings will become part of the prompt sent to the GPT4 model to enhance accuracy.
    • Prepare the prompts and send them along with the PDF screenshot to the GPT4 Vision API
    • Send the chunk to the frontend via Socket.IO incrementally.
  3. Whenever the frontend receives the chunk, it appends it to the codemirror editor, and checks if the current content is a valid YAML. If it's a valid YAML, it will apply it to the JSON Scheme to force the UI to re-render.