This repo contains all the basic techniques that are to be used in any NLP task.
It includes:
-
Data Pre-processing:
- Removing unwanted charcacters using Regex or similar technique
- Removing HTML tags, Unicode and other symbols, numbers, & Links
- Tokenization
- Changing case of all words
- Text Normalization
- Stemming
- Lemmatization
- Stop word removal
- Removing unwanted charcacters using Regex or similar technique
-
Embeddings and Representations
-
Modelling Techniques
- Classical ML Algorithms like Naive Bayes
- State of the art models like LSTM, BERT, GPT