-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add document classification models and datasets (#171)
* Add ReutersTrainer, ReutersEvaluator options in Factory classes * Add Reuters to Kim-CNN command line arguments * Fix SST dataset path according to changes in Kim-CNN args The dataset path in args.py was made to point at the dataset folder rather than dataset/SST folder. Hence SST folder was added to paths in the SST dataset class * Add Reuters dataset class, and support in __main__ * Add Reuters dataset trainers and evaluators * Remove debug print statement in reuters_evaluator * Fix rounding bug in reuters_trainer and reuters_evaluator * Add LSTM for baseline text classification measurements * Add eval metrics for lstm_baseline * Set batch_first param in lstm_baseline * Remove onnx args from lstm_baseline * Pack padded sequences in LSTM_baseline * Add TensorBoardX support for Reuters trainer * Add Arxiv Academic Paper Dataset (AAPD) * Add Hidden Bottleneck Layer to BiLSTM * Fix packing of padded tensors in Reuters * Add cmdline args for Hidden Bottleneck Layer for BiLSTM * Include pre-padding lengths in AAPD dataset * Remove duplication of preprocessing code in AAPD * Remove batch_size condition in ReutersTrainer * Add ignore_lengths option to ReutersTrainer and ReutersEvaluator * Add AAPDCharQuantized and ReutersCharQuantized * Rename Reuters_hierarchical to ReutersHierarchical * Add CharacterCNN for document classification * Update README.md for CharacterCNN * Fix table in README.md for CharacterCNN * Add AAPDHierarchical for HAN * Update HAN for changes in Reuters dataset endpoints * Fix bug in CharCNN when running on CPU * Add AAPD dataset support for KimCNN * Fix dataset paths for SST-1 * Fix dimensions of FC1 in CharCNN * Add model checkpointing for Reuters based on F1 * Refactor LSTM baseline __main__ * Add precision, recall and F1 to Reuters evaluator * Checkpoint only at the end of an epoch for ReutersTrainer Add detailed log printing for dev evaluations * Fix log_template and dev_log_template in ReutersTrainer * Add IMDB dataset * Fix duplicate printing of header in ReutersTrainer * Add support for single_label datasets in ReutersTrainer * Add support for IMDB dataset in lstm_baseline and lstm_reg * Fix evaluator call in main method of HAN * Add IMDB for HAN * Fix for single_label * Fix evaluate_dataset method for single_label datasets * Reduce default patience to 5 epochs before early stopping * Revert change to save_state rather than the entire model * Add Yelp 2018 dataset * Integrate Yelp2018 with LSTM baseline * Replace Yelp2018 with Yelp2014 dataset * Add Yelp2014 to LSTM Baseline * Integrate Yelp14 into LSTM Regularization * Remove dropout in HBL for LSTM Baseline and Reg * Add Yelp for HAN * Fix the saving issue for HAN * Fix loading for HAN * Fix typo in ReutersEvaluator * Print to STDOUT rather than logger * Print XML-CNN eval to STDOUT rather than logger * Update max_length for IMDB dataset * Add single_label support for char_cnn * Fix evaluation method for char_cnn * Remove unwanted parameters from ReutersTrainer and ReutersEval * Fix code formatting in lstm_reg/args * Add support for IMDB and Yelp in KimCNN * Fix single_label incorporation * Remove unnecessary conditions * Fix num_classes in Yelp2014 * Add single_label support for XML-CNN * Fix call to evaluator in XML-CNN * Address PEP8 issues * Address PEP8 issues * Address PEP8 issues * Address PEP8 issues
- Loading branch information
1 parent
57f53a8
commit dc086e8
Showing
23 changed files
with
311 additions
and
257 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.