Text classification is the task of assigning a set of predefined categories to free-text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new articles can be organized by topics, support tickets can be organized by urgency, chat conversations can be organized by language, brand mentions can be organized by sentiment, and so on. There are many approaches to automatic text classification, which can be grouped into three different types of systems:
- Rule-based systems
- Machine Learning based systems
- Hybrid systems
Deep learning algorithms such as Word2vec and Glove are also used in order to obtain better vector representations for words and improve the accuracy of classifiers trained with traditional machine learning algorithms. Few typical applications of text classification technology including all of the following:
- Social media monitoring.
- Brand monitoring.
- Customer service.
- Voice of customer.
Here we will be using Anaconda Python 3.6 , Pytorch 1.4 with GPU support CUDA 10 with CuDNN 10.
Installation of this project is pretty easy. Please do follow the following steps to create a virtual environment and then install the necessary packages in the following environment.
In Pycharm it’s easy
- Create a new project.
- Navigate to the directory of the project
- Select the option to create a new new virtual environment using conda with python3.6
- Finally create the project using used resources.
- After the project has been created, install the necessary packages from requirements.txt file using the command
pip install -r requirements.txt
In Conda also it’s easy
- Create a new virtual environment using the command
conda create -n your_env_name python=3.6
- Navigate to the project directory.
- Install the necessary packages from requirements.txt file using the command
pip install -r requirements.txt
This is the complete folder stucture of the project.
This file is used for data processing. It will create train_preprocessed.pickle , validation_preprocessed.pickle and test_preprocessed.pickle files under data folder.
This file will training the Word2Vec embeddings.
This file will train the LSTM network.
This file will be used for prediction of any input text.
To do the test testing we need to run the main.py and after that web server will start at http://0.0.0.0:5000/
Enter the text to be classified and click on Predict button.
Hence we have successfully build the text classifier using Word2vec and LSTM.
Here we have kept the scope a bit small but you can get better results using pretrained model BERT or GPT2 which are gaining a lot of popularity recently and better word embedding tecniques.
- Drive
- Time- 02-April-22,01:02:30