Skip to content

Analysis and predictions of the Titanic data set from Kaggle

Notifications You must be signed in to change notification settings

prdctofchem/Titanic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Titanic

Analysis and predictions of the Titanic data set from Kaggle

1. Goals and Assumptions

Primary goal: Predict the survival of passengers in the test group

  • Of the various features (e.g. age, social class, ticket fare), engineer them to be suitable to be fed into a machine learning algorithm
  • Train various machine learning models to make predictions of survival
  • Compare the outcome of the models based on metrics like classifcation reports and confusion matrices

Secondary goal: Determine which features are the most important predictors.

  • Trial/Error: What happens to the predictions when particular features are excluded from the training model (are they improved or deteriorated)?

Secondary goal: Determine why the best performing model fits the data.

  • Statistical theory: What is the intended use of the model, and what attributes of the data make it an appropriate fit?

2. Exploratory Data Analysis

Primary goal: Find clues to meaningful relationships amongst the data: Identify critical predictors.

  • Which features have a stark impact on survival?
  • Display these relationships in a simple, obvious manner for a non-technical audience (i.e. visualization).

Secondary goal: Demonstrate interesting relationships amongst the data, even if they do not correlate to critical predictors.

  • The definition of "interesting" will depend on the intended audience (e.g. cruiseline selling tickets for, or shipyard building Titanic MarkII; agency conduction safety investigations or social bias in state of emergency).

3. Data Cleaning, Feature Engineering

Primary goal: Prepare the features to be fed into machine learning algorithm.

  • Some of the predictors are binary or class-based.

4. Valuation, Machine Learning Analysis

  • In this case, I am building a classifier which means some statistical models are less appropriate.

About

Analysis and predictions of the Titanic data set from Kaggle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published