Add Reuters-21578 dataset #147

achyudh · 2018-09-25T08:08:08Z

I hope the commit messages are descriptive enough of the changes made. This PR is complementary to https://git.uwaterloo.ca/jimmylin/Castor-data/merge_requests/7#note_71256 in the data repository.

The dataset path in args.py was made to point at the dataset folder rather than dataset/SST folder. Hence SST folder was added to paths in the SST dataset class

Impavidity · 2018-09-27T03:53:46Z

common/evaluators/reuters_evaluator.py

+            # Using binary accuracy
+            for tensor1, tensor2 in zip(scores.round().long().data, batch.label.data):
+                print(tensor1, tensor2)
+                if np.array_equal(tensor1, tensor2):


So your evaluation criteria is that: give full credit when it is exact match otherwise, it get zero ?

Yup. Since only one or two in 90 classes would be the positive label, I'm not sure if something like hamming score would work. The fraction of labels that are incorrectly predicted would still be too small in that case,

Impavidity · 2018-09-27T03:54:07Z

common/evaluators/reuters_evaluator.py

+            scores = self.model(batch.text)
+            # Using binary accuracy
+            for tensor1, tensor2 in zip(scores.round().long().data, batch.label.data):
+                print(tensor1, tensor2)


remove this

My bad. I made the change, forgot to push it.

Impavidity

LGTM

achyudh added 5 commits September 25, 2018 04:00

Add ReutersTrainer, ReutersEvaluator options in Factory classes

c6c514c

Add Reuters to Kim-CNN command line arguments

51a72c7

Fix SST dataset path according to changes in Kim-CNN args

7e26889

The dataset path in args.py was made to point at the dataset folder rather than dataset/SST folder. Hence SST folder was added to paths in the SST dataset class

Add Reuters dataset class, and support in __main__

156b64f

Add Reuters dataset trainers and evaluators

aa1ef44

daemon requested review from daemon and Impavidity and removed request for daemon September 25, 2018 15:15

Impavidity reviewed Sep 27, 2018

View reviewed changes

achyudh added 2 commits September 27, 2018 00:05

Remove debug print statement in reuters_evaluator

81ab4de

Fix rounding bug in reuters_trainer and reuters_evaluator

e9a0da2

Impavidity approved these changes Oct 3, 2018

View reviewed changes

Impavidity merged commit 1b817d3 into castorini:master Oct 3, 2018

achyudh changed the title ~~WIP: Add Reuters-21578 dataset~~ Add Reuters-21578 dataset Oct 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Reuters-21578 dataset #147

Add Reuters-21578 dataset #147

achyudh commented Sep 25, 2018 •

edited

Loading

Impavidity Sep 27, 2018

achyudh Sep 27, 2018

Impavidity Sep 27, 2018

achyudh Sep 27, 2018

Impavidity left a comment

Add Reuters-21578 dataset #147

Add Reuters-21578 dataset #147

Conversation

achyudh commented Sep 25, 2018 • edited Loading

Impavidity Sep 27, 2018

Choose a reason for hiding this comment

achyudh Sep 27, 2018

Choose a reason for hiding this comment

Impavidity Sep 27, 2018

Choose a reason for hiding this comment

achyudh Sep 27, 2018

Choose a reason for hiding this comment

Impavidity left a comment

Choose a reason for hiding this comment

achyudh commented Sep 25, 2018 •

edited

Loading