NLP-Projects

Text Classification of twitter data

Projects that involve Natural Processing Language applications like Classification, Sentiment Analysis, Pre-processing etc

This is a simple project based on NLP- Text Classification of twitter data It shows the various steps which are involved during text classification and how a Machine Learning Model can be trained and tested.

Python Libraires used: Pandas, re (for Regular Expressoins) , nltk (Natural Language Toolkit), Scikit-Learn

Label-0 : Non Racist Tweets

Lable-1 : Racist Tweets

Project Workflow:

Import all the necessary libraries
Load the dataset (csv file) using pandas

Perform Data Cleaning

 3.1) Only keep alphabets of the tweet text
 3.2) Remove the unicode characters as they are useless
 3.3) Convert the text into lower case for consistency

Feature Engineering Feature engineering is the science (and art) of extracting more information from existing data. We are not adding any new data here, but we are actually making the data we already have more useful. The machine learning model does not understand text directly, so we create numerical features that reperesant the underlying text.
```
 4.1) Generate the word frequency
 4.2) Check whether a negation term is present in the text
 4.3) Check whether one of the 100 rare words is present in the text
 4.4) Check whether prompt words are present
```
Creation of numerical features in the dataset like---> word_count,any_neg,any_rare,char_count,is_question
Splitting the dataset into Train-Test split
Train an ML model for Text Classification- used NaiveBayes Classifier from sklearn
Evaluate the ML model- using various metrics lke classification_report , accuracy score

Result: Accuracy Score was found to be 60%

**) The various codes in comments can also be executed to get a better understanding of the workflow in this project

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
twitter_text.csv		twitter_text.csv
twitter_text_classification.py		twitter_text_classification.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Projects

Text Classification of twitter data

About

Releases

Packages

Languages

NakulLakhotia/Twitter-Text-Classification-on-Racism

Folders and files

Latest commit

History

Repository files navigation

NLP-Projects

Text Classification of twitter data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages