Spring-22-Database-Project

partners

Yewon Shin (yshin31)
Kyoungjin Lim(klim30)

List any changes / issues encountered

1. txt file formatting using ; for netflix data

Within the ‘title’ column of netflix, there were some content names that include ‘,’. We were using ‘,’ to separate data fields in our txt files, so when initiating it as a table on the database, data was not inserted properly. Instead of ‘,’, we decided to use ‘;’ to separate data fields.

2. primary key changed for covid data

In Phase B, we have indicated ‘Date’ as a primary key for Covid-19 data. However, when setting up the database for this phase, we have noticed that ‘Date’ can’t be a unique datafield. Different countries may have Covid data recorded on the same date, so this field cannot be used as a primary key. To solve this issue, we decided to create an additional column called ‘record_id’ (char data type). This column is made by merging date and country name, so it can uniquely define each covid-19 record. Due to this change, we had some modification on our ER diagram. Also, when creating tables on the database, we renamed our table names so that it’s more convenient for us to do queries in the future.

Covid: data table with covd-19 related data
Content: data table with content information on Netflix
Financial: data table with Netflix Quarterly Income Statement
Influenced_by: data table relating Covd & Content
- Note: when creating a table for netflix data, we named the table as Content instead of Netflix so it corresponds to our ER diagram

3. Influenced_by relation

From our ER diagram, two entities are in a strong-entity relation: Content and Covid. Besides three entities, we have also initiated a table called ‘Influenced_by’ relation as well. Instead of creating it directly from a .sql file, we created a separate ‘influenced_by’ txt just as we did for the three other entities. Here are the steps: the big idea here is that we merged two (Content, Covid) csv files in the preprocessing stage (cross join Content, Covid on two matching datafields): Covid record’s country equals Content’s released country Covid’s record year matches Content’s date added year using Python Pandas, we renamed columns ‘date_added’ and ‘released_country’ from Content csv file into ‘date’ and ‘country’ Renamed columns ‘record_date’ into ‘date’ from Covid csv file Created a new column called ‘Year’ for each csv file by extracting year from ‘date_added’ for Netflix data and ‘record_data’ for Covid data Merged to csv files on matching country & year columns Turned merged csv file into a txt file with two fields: show_id, record_id Created table called ‘Influneced_by’ on database using the above txt file

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
covid		covid
dataset		dataset
financial		financial
influenced_by		influenced_by
netflix		netflix
phaseD		phaseD
phaseE		phaseE
Query_1.php		Query_1.php
Query_1.sql		Query_1.sql
Query_10.php		Query_10.php
Query_10.sql		Query_10.sql
Query_11.php		Query_11.php
Query_11.sql		Query_11.sql
Query_12.php		Query_12.php
Query_12.sql		Query_12.sql
Query_13.php		Query_13.php
Query_13.sql		Query_13.sql
Query_14.php		Query_14.php
Query_14.sql		Query_14.sql
Query_15.php		Query_15.php
Query_15.sql		Query_15.sql
Query_16.php		Query_16.php
Query_16.sql		Query_16.sql
Query_17.php		Query_17.php
Query_17.sql		Query_17.sql
Query_18.php		Query_18.php
Query_18.sql		Query_18.sql
Query_19.php		Query_19.php
Query_19.sql		Query_19.sql
Query_2.php		Query_2.php
Query_2.sql		Query_2.sql
Query_3.php		Query_3.php
Query_3.sql		Query_3.sql
Query_4.php		Query_4.php
Query_4.sql		Query_4.sql
Query_5.php		Query_5.php
Query_5.sql		Query_5.sql
Query_6.php		Query_6.php
Query_6.sql		Query_6.sql
Query_7.php		Query_7.php
Query_7.sql		Query_7.sql
Query_8.php		Query_8.php
Query_8.sql		Query_8.sql
Query_9.php		Query_9.php
Query_9.sql		Query_9.sql
README.md		README.md
content_overview.php		content_overview.php
covid_overview.php		covid_overview.php
delete.html		delete.html
delete_content.php		delete_content.php
delete_covid.php		delete_covid.php
delete_financial.php		delete_financial.php
financial_overview.php		financial_overview.php
influenced_by-small.txt		influenced_by-small.txt
influenced_by.csv		influenced_by.csv
influenced_by.txt		influenced_by.txt
insert.html		insert.html
insert_content.php		insert_content.php
insert_covid.php		insert_covid.php
insert_financial.php		insert_financial.php
merge_influenced_by.py		merge_influenced_by.py
overview.html		overview.html
preprocessed_influenced_by.csv		preprocessed_influenced_by.csv
procedures.sql		procedures.sql
process.txt		process.txt
yshin31_klim30.html		yshin31_klim30.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spring-22-Database-Project

partners

List any changes / issues encountered

1. txt file formatting using ; for netflix data

2. primary key changed for covid data

3. Influenced_by relation

About

Releases

Packages

Contributors 2

Languages

jinny0909/Spring-22-Database-Project

Folders and files

Latest commit

History

Repository files navigation

Spring-22-Database-Project

partners

List any changes / issues encountered

1. txt file formatting using ; for netflix data

2. primary key changed for covid data

3. Influenced_by relation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages