ISTM-6212-Data-Management-for-Analytics

Learning Objectives (From Course Sllybus):

Develop theoretical foundation and practical experience working with a variety of traditional and contemporary data management tools, enabling you to work productively with any product or toolkit they might encounter
Gain skill in wrangling and exploring data with a variety of tools inside and outside of databases
Understand and be able to develop, deliver, and review reproducible data analyses

Individual Assignments:

Exercise 1: Unix Shell Basics
Work through as much of the Software Carpentry lesson on the Unix Shell as you can. Run through the Setup section just below, then open a shell from the command line or with a terminal session through Jupyter to run through the exercises.
Exercise 2: SQL Queries Operations
Gain experience loading a CSV dataset into a database and using SQL to explore its contents. Write and execute a number of SQL queries using common syntax and functions.
Exercise 3: Trifacta Wrangler and Apache Spark:
Wrangle a data set using two new tools, Trifacta Wrangler and Apache Spark. Results should include a cleaned-up data set and summary statistics.
Exercise 4: Schema and SQL
Gain experience loading a CSV dataset into a star schema. Explore the data by writing and executing a number of SQL queries using common syntax and functions and describing your findings.

Group Projects:

Final Project: Chicago 2016 Taxi Rides Analysis

Codes and Details

Part 1 - Data Selection and Cleaning

Identify and describe your dataset, its source, and what appeals to you about it. Acquire the data and perform an initial exploration to determine which themes you wish to explore. Describe the questions you want to be able to answer with the data, any concerns you have about the data, and any challenges you expect to have to overcome.

Part 2 - Wrangling

Based on what you found above, wrangle the data into a format suitable for analysis. This may involve cleaning, filtering, merging, and modeling steps, any and all of which are valid for this project. Describe your process as you proceed, and document any scripts, databases, or other models you develop. Be specific about any key decisions to modify or remove data, how you overcame any challenges, and all assumptions you make about the meaning of variables and their values.

Part 3 - Analysis

Explore and analyze your data in its wrangled form. Follow through on the themes you identified in Part 1 with queries or scripts that answer the questions you had in mind. Be clear about the answers you discover, discussing them and whether the results match your expectations. Include charts or other visuals that support your analysis. You may use Tableau, matplotlib, ggplot, or other tools we have not covered in class for visualization (and only for visualization), but be sure to export images from those tools and to include any images properly in your notebook writeup and slides.

What we focus on:

Which time does the customers use taxies most?
For the peak hours, what is do peopel's desitnation?
Which area is the most popular for people to use taxi?
How is the taxi priced, which company has highest initial charge?
Why some company have high initial charge?
Whcih ccompany has initial charge?
Why some companies have high tips?
Which payment type does people usually prefer?
What is trip total difference between weekends and weekdays?
What is the relationship between time and pickup location?

Part 4 - More data

Sometimes the most value can be gained from one dataset when it is studied alongside data drawn from other sources. Identify and describe at least one additional data source that can complement your analysis. Pull this additional data into your chosen environment and explore at least one more theme you are able to further analyze that depends upon a combination of data from both sources. In order to gain more information from the chicago taxi dataset, we was able to found another dataset which contains the weather of 30 countries in North America, from 2012 to 2017. To get what we want, we use csvcut to select Chicago column and then use xsv to select only 2016's data, save them into chicago_weather_2016.csv

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
A2_Abby(Jingyi) Liu.ipynb		A2_Abby(Jingyi) Liu.ipynb
ChicagoTaxislides.pdf		ChicagoTaxislides.pdf
G4420603_HW1.ipynb		G4420603_HW1.ipynb
Part1 Clean the data.ipynb		Part1 Clean the data.ipynb
Part1-Part4.ipynb		Part1-Part4.ipynb
Project Cover.png		Project Cover.png
Project2.ipynb		Project2.ipynb
README.md		README.md
Schema.png		Schema.png
assignment4-AbbyLiu.ipynb		assignment4-AbbyLiu.ipynb
e3_G44206031.ipynb		e3_G44206031.ipynb
poject1_Abby.ipynb		poject1_Abby.ipynb
project-3.ipynb		project-3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISTM-6212-Data-Management-for-Analytics

Learning Objectives (From Course Sllybus):

Individual Assignments:

Group Projects:

Final Project: Chicago 2016 Taxi Rides Analysis

Codes and Details

Part 1 - Data Selection and Cleaning

Part 2 - Wrangling

Part 3 - Analysis

Part 4 - More data

About

Releases

Packages

Languages

Abby7LIU/ISTM-6212-Data-Management-for-Analytics

Folders and files

Latest commit

History

Repository files navigation

ISTM-6212-Data-Management-for-Analytics

Learning Objectives (From Course Sllybus):

Individual Assignments:

Group Projects:

Final Project: Chicago 2016 Taxi Rides Analysis

Codes and Details

Part 1 - Data Selection and Cleaning

Part 2 - Wrangling

Part 3 - Analysis

Part 4 - More data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages