URDU-SENTIMENT-ANALYSIS-TOOL

A Natural Language Processing (NLP) pipeline for performing sentiment analysis on Urdu social media posts. This project uses several NLP techniques and machine learning models to classify sentiments in Urdu text with high accuracy.

Project Overview

This project implements a sentiment analysis tool specifically tailored for Urdu text, focusing on social media content. Given the complexities of Urdu language processing, the project includes custom tools for preprocessing, feature extraction, and model training. Using popular libraries like NLTK, Gensim, Scikit-Learn, and Pandas, the project provides a comprehensive solution for Urdu sentiment analysis.

Features

Text Preprocessing: Includes tokenization, stopword removal, stemming, and lemmatization customized for Urdu.
Feature Extraction: TF-IDF and Word2Vec models for feature representation.
N-Gram Analysis: Captures context through n-grams.
Sentiment Classification: Logistic Regression classifier for sentiment prediction.
Performance Metrics: Evaluation using accuracy, precision, recall, and F1-score.

Installation

Clone the repository and install the required dependencies:

git clone https://github.com/your-username/urdu-sentiment-analysis.git
cd urdu-sentiment-analysis

Usage

To use the sentiment analysis tool, follow these steps:

Data Preparation: Load Urdu social media text data in a structured format.
Run Preprocessing: Use the provided scripts to clean and preprocess the text.
Train Model: Run the training script to build the logistic regression classifier.
Evaluate Model: Evaluate model performance using the metrics provided.

Example command:

python sentiment_analysis.py --input your_data_file.csv

Preprocessing Pipeline

The Urdu text preprocessing pipeline includes:

Tokenization: Custom Urdu tokenization.
Stopword Removal: Removes common Urdu stopwords.
Stemming & Lemmatization: Reduces words to their root forms for better analysis.

Modeling

The tool uses a logistic regression model with TF-IDF and Word2Vec representations. An n-gram analysis is conducted to capture word dependencies and improve model accuracy.

Evaluation

Model performance is evaluated on the following metrics:

Accuracy
Precision
Recall
F1-Score

Technologies Used

Python
NLTK
Gensim
Scikit-Learn
Pandas

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
code.ipynb		code.ipynb
urdu_sarcastic_dataset.csv		urdu_sarcastic_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

URDU-SENTIMENT-ANALYSIS-TOOL

Project Overview

Features

Installation

Usage

Preprocessing Pipeline

Modeling

Evaluation

Technologies Used

About

Releases

Packages

Languages

mrsage-101/URDU-SENTIMENT-ANALYSIS-TOOL

Folders and files

Latest commit

History

Repository files navigation

URDU-SENTIMENT-ANALYSIS-TOOL

Project Overview

Features

Installation

Usage

Preprocessing Pipeline

Modeling

Evaluation

Technologies Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages