Inital Setup

Mortgage Data Analysis

Inital Setup

Register for Fannie Mae: https://loanperformancedata.fanniemae.com/lppub/index.html#.
Register for Freddie Mac: https://freddiemac.embs.com/FLoan/Bin/loginrequest.php.
Pull mortgage-data-analysis repository in EC2 instance (git clone https://github.com/kr900910/mortgage-data-analysis.git).
Create temp_download directory inside mortgage-data-analysis (mkdir temp_download).

Download the data

Go to mortgage-data-analysis/loading_and_modeling, and pip install requests==2.5.3.
Type python download_freddie_mac.py. Enter credentials and quarters to download when prompted. This downloads zip files into the current folder for each quarter.
Type python download_fannie_mae.py. Enter credentials and quarters to download when prompted. This downloads zip files into the current folder for each quarter.

Move the data into HDFS directory

Start Hadoop, postgres, and Hive in EC2 instance.
If this is your first time, type . create_hdfs_dir.sh. This creates necessary HDFS folders.
Type . unzip_to_HDFS.sh. This unzips the zipped files into mortgage-data-analysis/temp_download, removes the zipped files, loads unzipped files to HDFS, and removes the unzipped files. Note that this step can take 15-30 minutes depending on number of quarters being loaded.

Create Hive tables

Go to mortgage-data-analysis/transforming and type . create_hive_tables.sh. This creates Hive metadata for base Fannie and Freddie data in hdfs and for the combined data sets. Note that this script can take several hours to run, depending on how many quarters of data are there (for 15 quarters, acquisition data took 10 min, performance data took ~ 2 hours).

Use Tableau to visualize data

Once Hive tables are created, start HiveServer2 by typing hive --service hiveserver2 &.
Set up an ODBC connection with the server in Tableau and visualize data as necessary. A sample Tableau workbook along with the CSV file extracted from one of Hive tables are available in mortgage-data-analysis/serving folder.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
loading_and_modeling		loading_and_modeling
serving		serving
transforming		transforming
Fannie_Mae_Data_Dictionary.pdf		Fannie_Mae_Data_Dictionary.pdf
Freddie_Mac_Data_Dictionary.pdf		Freddie_Mac_Data_Dictionary.pdf
README.md		README.md
W205 Final Project_Progress Report_Alyssa-Chris-Gerard-Seung.pdf		W205 Final Project_Progress Report_Alyssa-Chris-Gerard-Seung.pdf
W205 Final Project_Proposal Report_Alyssa-Chris-Gerard-Seung.pdf		W205 Final Project_Proposal Report_Alyssa-Chris-Gerard-Seung.pdf
W205_Final_Project_Final_Presentation_Alyssa-Chris-Gerard-Seung.pdf		W205_Final_Project_Final_Presentation_Alyssa-Chris-Gerard-Seung.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mortgage Data Analysis

Inital Setup

Download the data

Move the data into HDFS directory

Create Hive tables

Use Tableau to visualize data

About

Releases

Packages

Contributors 3

Languages

kr900910/mortgage_data_analysis

Folders and files

Latest commit

History

Repository files navigation

Mortgage Data Analysis

Inital Setup

Download the data

Move the data into HDFS directory

Create Hive tables

Use Tableau to visualize data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages