Day3

Day 3: Data Modelling

Date: 28/12/2023

Topic:

-Data
-Types of Attributes
-Preprocessing
-Transformations
-Measures
-Visualization

ML Application Development:

1. Task: Problem definition
2. Collection of data
3. Clean & process that data (Pre-processing)
4. Feature Transformation
5. Feature Selection
6. Learning algorithm
7. Evaluation, Visualization and Interpretation---> (Hidden information)
8. Conclude

Data: collection of data objects and their attributes

Types of Data:

-Numerical : Made up of numbers
	-Continuous: infinite real numbers
	-Discrete: finite
	
-Categorical : Made up of words
	-Ordinal : Distinct & ordered
		Eg: eye color, zip codes
	
	-Nominal :Distinct
		Eg:Ranking list
		
	-Interval : Range
		Eg: Temperature
		
	-Ratio : All
		Eg: Girls-Boys ratio
	
	Ex: Age: 15 to 78yrs
		-<20yrs : Teen
		-20 to 45yrs : Yng
		->40yrs : Adult

Types of dataset:

-Record
	-Data Matrix
	-Document data
	-Transaction data
-Graph
	-World wide web
	-Molecular data structure
-Ordered
	-Spatial data
	-Temporal(time series) data
	-Sequential data
	-Genetic data

Knowledge Discovery Process:

1. Goal:
	-understanding the application domain and goals of KDD effort
2. Data selection,acquisition, integration
3. Data cleaning
	-noise, missing data, outliers, etc.
4. Exploratory data analysis
	-Dimensionality reduction, transformation
	-selection of appropriate model for analysis, hypothesis testing
5. Data mining/Machine Learning
	-selecting appropriate method that match set goals ( classification, regression,clustering)
6. Testing and verification
7. Interpretation
8. Conclusions

Data Pre-processing steps:

1. Get the dataset
2. Import libraries
3. Import dataset
4. Analyse the data & identify the missing data
5. Apply feature transformation
6. Split the dataset into training and testing data

Feature Transformation:

Label Encoding:

 Label encoding is a technique used in machine learning and data analysis to convert categorical variables into numerical format. It is particularly useful when working with algorithms that require numerical input, as most machine learning models can only operate on numerical data.

Onehot Encoding:

One Hot Encoding is a technique that is used to convert categorical variables into numerical format.

Feature Scaling:

Homework:

Machine Learning Problem Statement: Titanic Survival Prediction

Objective:

Develop a machine learning model to analyze the Titanic dataset and predict passenger survival based on various features. The model should utilize techniques such as simple imputation for handling missing values, label encoding and one-hot encoding for categorical variables, and standardization for numerical features.

Dataset:

The dataset contains information about Titanic passengers, including features such as age, gender, class, ticket fare, and whether the passenger survived or not (target variable).

Tasks:

Data Exploration:

Explore the dataset to understand the distribution of features.
Identify missing values and outliers.

Data Preprocessing:

Handle missing values using simple imputation for features like age and embarked.
Apply label encoding to convert categorical variables like gender into numerical representations.
Use one-hot encoding for categorical variables with more than two categories, such as embarked.
Standardize numerical features like age and fare to bring them to a common scale.

Feature Engineering:

Explore the creation of new features or interactions that might enhance predictive power.
Consider combining or transforming existing features if necessary.

Name		Name	Last commit message	Last commit date
parent directory ..
Day3.ipynb		Day3.ipynb
Day3.md		Day3.md
Day31.ipynb		Day31.ipynb
Day31.md		Day31.md
Readme.md		Readme.md
output_10_1.png		output_10_1.png
output_9_1.png		output_9_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Day3

Day3

Readme.md

Day 3: Data Modelling

Date: 28/12/2023

Topic:

ML Application Development:

Data: collection of data objects and their attributes

Types of Data:

Types of dataset:

Knowledge Discovery Process:

Data Pre-processing steps:

Feature Transformation:

Label Encoding:

Onehot Encoding:

Feature Scaling:

Homework:

Machine Learning Problem Statement: Titanic Survival Prediction

Objective:

Dataset:

Tasks:

Data Exploration:

Data Preprocessing:

Feature Engineering:

Files

Day3

Directory actions

More options

Directory actions

More options

Latest commit

History

Day3

Folders and files

parent directory

Readme.md

Day 3: Data Modelling

Date: 28/12/2023

Topic:

ML Application Development:

Data: collection of data objects and their attributes

Types of Data:

Types of dataset:

Knowledge Discovery Process:

Data Pre-processing steps:

Feature Transformation:

Label Encoding:

Onehot Encoding:

Feature Scaling:

Homework:

Machine Learning Problem Statement: Titanic Survival Prediction

Objective:

Dataset:

Tasks:

Data Exploration:

Data Preprocessing:

Feature Engineering: