Skip to content

This repository contains analysis of IMDB data from multiple sources and analysis of movies/cast/box office revenues, movie brands and franchises

Notifications You must be signed in to change notification settings

ashmitan/IMDB-Analysis

Repository files navigation

IMDB Data Analysis Pipeline

Objective:

The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend.

Technologies Used:

ER/ Studio SQL server Developer Edition Microsoft SQL server Management Studio Talend Real-Time Data Platform 7.1 Tableau Desktop Microsoft PowerBI

Dataset Links:

https://datasets.imdbws.com/ https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab https://grouplens.org/datasets/movielens/25m/ https://www.the-numbers.com/movies/franchises https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

Code Walkthrough:

Run following script in SSMS to setup the staging database

The Number - stage tables.sql

stg imdb tables - core tables.sql

stg imdb tables expanded part 2.sql

stg_ml_tables.sql

Open Talend and setup your database connections and input file connections

When the connections are successfull run the main job

Perform Visualizations in Tableau and PowerBI

About

This repository contains analysis of IMDB data from multiple sources and analysis of movies/cast/box office revenues, movie brands and franchises

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages