Document Extraction Service

This repository contains the code for the Document Extraction Service.

Further details on this service can be found in the Design Doc.

Server

System Requirements

These instructions assume you are on macOS or a Unix system.

It also assumes you have Python 3 (>= Python 3.9.6) installed and running.

Setup

Quick Setup

Note: For your convenience, there is a setup script you can run in lieu of manually running the commands.

Please run this from the terminal in the repository root:

chmod +x server_setup.sh
./server_setup.sh
source venv/bin/activate

Manual Setup

To get setup with the server, run the following instructions in your terminal.

Note: The below snippets assume the repo root is your current directory.

Create a virtual environment:

python3 -m venv {environment name, i.e. "venv"}

Activate your virtual environment
```
source {environment name}/bin/activate
```
Install the dependencies from the requirements.txt file located in the server/ directory:
```
python3 -m pip install -r server/requirements.txt
```
Poppler

In addition to the Python requirements, to use the package pdf2image, you will need to install poppler.

On macOS you can do this via homebrew by running brew install poppler from the terminal. More information is provided here in the pdf2image README.

Note: This might take a bit of time if homebrew wasn't updated recently.

Environment Variables

OpenAI API Key

The server utilizes the OpenAI API which requires an API Key. You can get an API Key on the API Keys page of OpenAI's website.

Once retreived, place this key in the environment variable OPENAI_API_KEY on your system via the shell config (i.e. in ~/.zshrc, ~/.bashrc etc). Note: You will need to restart your terminal or source the profile after changing the config.

export OPENAI_API_KEY='<your-key>'

Running the server locally

From you terminal, change directories to the server/ directory:

cd server/

Then, run the following django command:

python3 manage.py runserver

The server should be running on port 8000 and you should be able to ping it via http://127.0.0.1:8000/ or http://localhost:8000.

Client

The client application is an example NextJS application that makes use of the extractor service and a Feathery form. It is meant to be a playground to show what kind of data is extracted by the service and how it might be displayed in a UX.

System Requirements

These instructions assume you are running node version >= 21.5.0 and npm >= 10.2.4.

Setup

Change directories into the client/ directory and run the following command from your terminal:

npm install

This will install all of the dependencies.

Running Locally

To run the NextJS application locally, from your terminal run the command:

npm run dev

The web application will be running at http;//localhost:3000.

Note: for the web app to work, the Django extractor service must also be running. See the Server section above for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
client		client
server		server
.gitignore		.gitignore
README.md		README.md
server_setup.sh		server_setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Extraction Service

Server

System Requirements

Setup

Quick Setup

Manual Setup

Environment Variables

Running the server locally

Client

System Requirements

Setup

Running Locally

About

Releases

Packages

Languages

soundsbywise/doc-extractor

Folders and files

Latest commit

History

Repository files navigation

Document Extraction Service

Server

System Requirements

Setup

Quick Setup

Manual Setup

Environment Variables

Running the server locally

Client

System Requirements

Setup

Running Locally

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages