Skip to content
/ bpraass Public

Boston Public Records as a Spreadsheet (BPRaaSS)

Notifications You must be signed in to change notification settings

nstory/bpraass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ceasefire Now

Boston Public Records as a Spreadsheet (BPRaaSS)

The City of Boston provided us with PDF documents showing all the public records requests made to the city. This project converts those PDFs into CSV files that can be conveniently worked with in Excel.

You probably want to download one or more of these files:

Nota bene

This project uses optical character recognition (OCR) to extract text from PDF files provided by the City of Boston. This process is not 100% accurate; be mindful that the spreadsheet will contain errors.

Build

$ docker compose build && docker compose run app
$ make clean all

What are these files and stuff?

  • input/*.pdf PDF files provided by Boston Public Records
  • input/AllCityofBostonRequeststoDate_06202023.CSV CSV file provided by the city that contains some columns
  • output/*.pdf are the PDF files run through OCRmyPDF. This is necessary because some pages are images (not text)
  • output/*.csv spreadsheets created from the .pdf files

See also

LICENSE

This project is released under the MIT License.

About

Boston Public Records as a Spreadsheet (BPRaaSS)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published