HLT-News-Category

News classification task for Human Language Technologies course.

Authors:

Fabrizio De Castelli - M.Sc. in Artificial Intelligence, University of Pisa
Francesco Aliprandi - M.Sc. in Artificial Intelligence, University of Pisa
Francesco Simonetti - M.Sc. in Artificial Intelligence, University of Pisa
Marco Minniti - M.Sc. in Artificial Intelligence, University of Pisa
Tommaso Di Riccio - M.Sc. in Artificial Intelligence, University of Pisa

Abstract

This project concerns the multinomial classification of HuffPost news articles from Kaggle dataset. We compare the performance of various models. As baseline Naive Bayes, Logistic Regression were considered, as intermediate bidirectional LSTM and as state-of-the-art models the ones from BERT family and LLAMA3 LLM.

Guidelines for running the code

Recommended version of python is python==3.1.18.

To install the required libraries, run the following command:

bash pip install -r requirements.txt

For test execution, each model is implemented in a separate notebook file. Notebooks are located in src/test folder. Each notebook is named with the model name.

For running the bidirLSTM experiment pre-trained embeddings are required. For this analysis several pre-trained embeddings were considered, but only those of Glove 6B with size 300 were tested. For running the code create a folder named embeddings in the root directory and put the glove.6B.300d.txt file in it.

Alongside the model notebooks, src/test repository, includes notebooks containing dataset and pipeline statistics (dataset_stat and pipeline_stat), as well as a notebook showcasing the clustering analysis performed using BERTopic.

Finally the src/test/unit_testing folder contains python scripts for unit testing of the pipeline and pipeline's functions.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
config		config
dataset		dataset
hyperparameters		hyperparameters
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
INSTRUCTIONS.md		INSTRUCTIONS.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HLT-News-Category

Authors:

Abstract

Guidelines for running the code

About

Releases

Packages

Contributors 5

Languages

FabriDeCastelli/HLT-News-Category

Folders and files

Latest commit

History

Repository files navigation

HLT-News-Category

Authors:

Abstract

Guidelines for running the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages