In this repository, we are working on Text Coherence Assessment of paper.
Install Preprocessed dataset from here and add in folder processed_data, folder.
To train the model, you have to use the run.sh file and change the parameters in it as required. Then simply do the following:
The metrics are as follows:
corpus
can take one of 'gcdc' or 'wsj'.sub_corpus
can take anyone value from 'Clinton', 'Enron', 'Yelp' or 'Yahoo' ifcorpus
is gcdcarch
can take one of vanilla, hierarchicaltask
can take one of 3-way-classification, minority-classification,sentence-ordering or sentence-score-prediction for GCDC dataset and only sentence-ordering for WSJ datasetmodel_name
defines transformer model to use. (by-default its's roberta-base) For training custom model
bash try.sh
To make changes to try.sh file
python3 main.py --arch <arch_name> --corpus <corpus_name> --task <task_name>
For evaluating on datasets, do the following:
bash infer.sh
To make changes in inferences:
python3 main.py --sub_corpus <name if gcdc> --inference --arch <arch_name> --corpus <dataset_name> --freeze_emb_layer --task <task_name> --checkpoint_path <saved_checkpoint_path>
We also have submitted the models here