The City of Boston provided us with PDF documents showing all the public records requests made to the city. This project converts those PDFs into CSV files that can be conveniently worked with in Excel.
You probably want to download one or more of these files:
- output/2021AllRequests_Q1_Redacted.csv
- output/2021AllRequests_Q2_Redacted.csv
- output/2021AllRequests_Q3_Redacted.csv
- output/2021AllRequests_Q4_Redacted.csv
- output/2022AllRequests_Q1_Redacted.csv
- output/2022AllRequests_Q2_Redacted.csv
- output/2022AllRequests_Q3_Redacted.csv
- output/2022AllRequests_Q4_Redacted.csv
- output/2023AllRequests_Q1.csv
- 2023_All_Requests_-_Q2_Redacted.csv
- output/City_of_Boston_Public_Records_Requests_2017_Redacted.csv
- output/City_of_Boston_Public_Records_Requests_2018_Redacted.csv
- output/City_of_Boston_Public_Records_Requests_2019_Redacted.csv
- output/City_of_Boston_Public_Records_Requests_2020_Redacted.csv
- output/all_requests.csv this is all the other files concatenated together
This project uses optical character recognition (OCR) to extract text from PDF files provided by the City of Boston. This process is not 100% accurate; be mindful that the spreadsheet will contain errors.
$ docker compose build && docker compose run app
$ make clean all
input/*.pdf
PDF files provided by Boston Public Recordsinput/AllCityofBostonRequeststoDate_06202023.CSV
CSV file provided by the city that contains some columnsoutput/*.pdf
are the PDF files run through OCRmyPDF. This is necessary because some pages are images (not text)output/*.csv
spreadsheets created from the .pdf files
- FOIA Request: FOIA logs for years 2016, 2017, 2018, 2019 and 202
- PRR: Boston Public Records Request Log 2021 to Present
- Boston Public Records Log and nstory/boston_public_records for my previous work with the 2017 - 2020 files
This project is released under the MIT License.