expense-parser-python

Expense Parser Demo (Python)

Description

This repository constructs an end-to-end pipeline on GCP to process expenses (ie. receipts) with the Document AI API. This repository serves as a sample code to build your own demo but is not tested for production.

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

Create a Google Cloud Platform Project
Enable the Cloud Document AI API, Cloud Functions API and Cloud Build API in the project you created in step #1.
If you do not have access to the parser, request access via this link. Here is a link to the official Expense Parser documentation.
Create a service account that will later be used by Cloud Functions
1. Navigate to IAM & Admin -> Service Accounts
2. Click on Create a service account
3. In the Service account name section, type in process-receipt-example or a name of your choice
4. Click Create and continue
5. Grant this service account the following roles:
  - Storage Admin
  - BigQuery Admin
  - Document AI API User
6. Click Done and you should see this service account in the IAM main page
Create your Doc AI processor
- At this point, you should have your request in Step 3 approved and have access to expense parser
- Navigate to console -> Document AI -> processors
- Click Create processor and choose expense parser
- Name your processor and click Create
- Take note of your processor's region (eg. us) and processor ID
Activate your Cloud Shell and clone this GitHub repository in your Command shell using the command:

gh repo clone GoogleCloudPlatform/document-ai-samples

Note: If you are using your local terminal, please follow this link to install and initiate Google Cloud CLI before this step.

Execute Bash shell scripts in your Cloud Shell terminal to create cloud resources (i.e Google Cloud Storage Buckets, Pub/Sub topics, Cloud Functions, BigQuery dataset and table)
1. Change directory to the scripts folder
```
cd community/expense-parser-python
```
2. Update the following values in .env.local:
  - PROJECT_ID should match your current project's ID
  - BUCKET_LOCATION is where you want the raw receipts to be stored
  - CLOUD_FUNCTION_LOCATION is where your code executes
  - CLOUD_FUNCTION_SERVICE_ACCOUNT should be the same name you created in Step 4
```
vim .env.local
```
3. Make your .sh files executable
```
chmod +x set-up-pipeline.sh
```
4. Change directory to the cloud functions folder
```
cd cloud-functions
```
5. Update the following values in .env.yaml (from your note in Step 5):
  - PARSER_LOCATION
  - PROCESSOR_ID
```
vim .env.yaml
```
6. Go back to the original folder and execute your .sh files to create cloud resources
```
cd ..
./set-up-pipeline.sh
```
Testing/Validating the demo
1. Upload a sample receipt in the input bucket (<project_id>-input-receipts)
2. At the end of the processing, you should expect your BigQuery tables to be populated with extracted entities (eg. total_amount, supplier_name, etc.). Note that each row is an extracted entity instead of a document.
3. With the structured data in BigQuery, we can now design downstream analytical tools to gain actionable insights as well as detect errors/frauds.

Disclaimer

This community sample is not officially maintained by Google.

Name		Name	Last commit message	Last commit date
parent directory ..
cloud-functions		cloud-functions
table-schema		table-schema
.env.local		.env.local
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
set-up-pipeline.sh		set-up-pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expense-parser-python

expense-parser-python

README.md

Expense Parser Demo (Python)

Description

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

Disclaimer

Files

expense-parser-python

Directory actions

More options

Directory actions

More options

Latest commit

History

expense-parser-python

Folders and files

parent directory

README.md

Expense Parser Demo (Python)

Description

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

Disclaimer