You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-Data
-Types of Attributes
-Preprocessing
-Transformations
-Measures
-Visualization
ML Application Development:
1. Task: Problem definition
2. Collection of data
3. Clean & process that data (Pre-processing)
4. Feature Transformation
5. Feature Selection
6. Learning algorithm
7. Evaluation, Visualization and Interpretation---> (Hidden information)
8. Conclude
Data: collection of data objects and their attributes
Types of Data:
-Numerical : Made up of numbers
-Continuous: infinite real numbers
-Discrete: finite
-Categorical : Made up of words
-Ordinal : Distinct & ordered
Eg: eye color, zip codes
-Nominal :Distinct
Eg:Ranking list
-Interval : Range
Eg: Temperature
-Ratio : All
Eg: Girls-Boys ratio
Ex: Age: 15 to 78yrs
-<20yrs : Teen
-20 to 45yrs : Yng
->40yrs : Adult
Types of dataset:
-Record
-Data Matrix
-Document data
-Transaction data
-Graph
-World wide web
-Molecular data structure
-Ordered
-Spatial data
-Temporal(time series) data
-Sequential data
-Genetic data
Knowledge Discovery Process:
1. Goal:
-understanding the application domain and goals of KDD effort
2. Data selection,acquisition, integration
3. Data cleaning
-noise, missing data, outliers, etc.
4. Exploratory data analysis
-Dimensionality reduction, transformation
-selection of appropriate model for analysis, hypothesis testing
5. Data mining/Machine Learning
-selecting appropriate method that match set goals ( classification, regression,clustering)
6. Testing and verification
7. Interpretation
8. Conclusions
Data Pre-processing steps:
1. Get the dataset
2. Import libraries
3. Import dataset
4. Analyse the data & identify the missing data
5. Apply feature transformation
6. Split the dataset into training and testing data
Feature Transformation:
Label Encoding:
Label encoding is a technique used in machine learning and data analysis to convert categorical variables into numerical format. It is particularly useful when working with algorithms that require numerical input, as most machine learning models can only operate on numerical data.
Onehot Encoding:
One Hot Encoding is a technique that is used to convert categorical variables into numerical format.
Feature Scaling:
Homework:
Machine Learning Problem Statement: Titanic Survival Prediction
Objective:
Develop a machine learning model to analyze the Titanic dataset and predict passenger survival based on various features. The model should utilize techniques such as simple imputation for handling missing values, label encoding and one-hot encoding for categorical variables, and standardization for numerical features.
Dataset:
The dataset contains information about Titanic passengers, including features such as age, gender, class, ticket fare, and whether the passenger survived or not (target variable).
Tasks:
Data Exploration:
Explore the dataset to understand the distribution of features.
Identify missing values and outliers.
Data Preprocessing:
Handle missing values using simple imputation for features like age and embarked.
Apply label encoding to convert categorical variables like gender into numerical representations.
Use one-hot encoding for categorical variables with more than two categories, such as embarked.
Standardize numerical features like age and fare to bring them to a common scale.
Feature Engineering:
Explore the creation of new features or interactions that might enhance predictive power.
Consider combining or transforming existing features if necessary.