-
Notifications
You must be signed in to change notification settings - Fork 52
Dataset: Marshal Evictions
The marshal_evictions
dataset can be added to NYCDB by running:
nycdb --download marshal_evictions
nycdb --load marshal_evictions
The dataset's original source is the NYC open data portal's Evictions dataset, which is published by the NYC Department of Investigation (DOI). However, since the data published on the NYC open data portal contains very messy addresses and cannot be matched easily to buildings, we rely on some data processing from the ANHD DAP Portal which cleans, deduplicates, and geocodes the raw data from DOI and provides the processed data via API.
The code that ANHD uses to process the data is open source on Github. This code does three things:
- It cleans the
evictionaddress
field from the DOI data using this python script. - It deduplicates any rows that have the same exact
evictionaddress
,evictionaptnum
,executeddate
, andmarshallastname
. - It appends a
bbl
code to each eviction record using the GeoSearch API provided by NYC Planning Labs. Given the process of cleaning the eviction addresses, GeoSearch typically is able to find abbl
code for about 96% of eviction records from the raw data. See below why having thebbl
code is so important.
Beyond the source code, you can also learn more about the ANHD DAP Portal API through their API documentation.
You can download the data dictionary PDF from NYC Open Data, which provides a decent amount of detail on all of the original columns that DOI provides.
ANHD's DAP Portal also adds a few columns:
-
bbl
is the 10-digit padded Borough Block Lot (BBL) number for the record. Having thisbbl
code allows us to connect this eviction data with all of the other building-level datasets in nycdb. -
cleanedaddress
is the result of running the original value forevictionaddress
through ANHD's address cleaning python script. The goal of the script is to standardize these eviction addresses while not actually changing the location they refer to. -
geosearchaddress
is the address recognized by the GeoSearch geocoding service that ANHD DAP Portal uses to find abbl
value for each address. This field is only populated if it differs from thecleanedaddress
. -
uniqueid
is a concatenation ofevictionaddress
,evictionaptnum
,executeddate
, andmarshallastname
, which you can treat as a unique identifier for each eviction record.
This dataset has the following tables:
-
marshal_evictions_all
contains all eviction data from January 1st, 2017 to the current day. This data updates daily. -
marshal_evictions_17
,marshal_evictions_18
, andmarshal_evictions_19
are tables that just have eviction data for a given year. For 2017 and 2018, some pieces of the cleaning, geocoding, and deduplication process were done manually, or using other tools like R. The 2019 data was pulled directly from the ANHD DAP Portal API described above. Other than some potential minor discrepancies, this data should be the same data found in themarshal_evictions_all
table.
Following are some useful SQL queries related to the dataset.
Remember, the full dataset contains both residential AND commercial evictions
SELECT *
FROM marshal_evictions_all
WHERE residentialcommercialind = 'RESIDENTIAL'
SELECT bbl, count(*) AS total_evictions
FROM marshal_evictions_all
WHERE residentialcommercialind = 'RESIDENTIAL' AND BOROUGH = 'BRONX'
GROUP BY bbl
ORDER BY total_evictions DESC
SELECT concat(marshalfirstname,' ',marshallastname) AS marshalfullname
FROM marshal_evictions_all
WHERE executeddate >= '2020-01-01' AND executeddate < '2021-01-01'
GROUP BY marshalfullname
SELECT date_part('year',executeddate) as year, COUNT(*)
FROM marshal_evictions_all
GROUP BY year
ORDER BY year DESC