NLP_Basics

Basic NLP Algorithm theory and implementation

This repo contains all the basic techniques that are to be used in any NLP task.

It includes:

Data Pre-processing:
- Removing unwanted charcacters using Regex or similar technique
  - Removing HTML tags, Unicode and other symbols, numbers, & Links
- Tokenization
- Changing case of all words
- Text Normalization
  - Stemming
  - Lemmatization
- Stop word removal
Embeddings and Representations
- 1. Bag of Words (BOW)
- 1. Term Frequency-Inverse Document Frequency (TF-IDF)
- 1. Word2Vec
- 1. GloVe
Modelling Techniques
- Classical ML Algorithms like Naive Bayes
- State of the art models like LSTM, BERT, GPT