Skip to content

Commit

Permalink
add data
Browse files Browse the repository at this point in the history
  • Loading branch information
cnap committed Sep 1, 2022
1 parent b3b0241 commit c46d3aa
Show file tree
Hide file tree
Showing 62 changed files with 53,045 additions and 3 deletions.
29 changes: 26 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,41 @@
# Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses


```
@article{doi:10.1162/tacl\_a\_00282,
@article{doi:10.1162/tacl_a_00282,
author = {Napoles, Courtney and Nădejde, Maria and Tetreault, Joel},
title = {Enabling Robust Grammatical Error Correction in New Domains: Data Sets, Metrics, and Analyses},
journal = {Transactions of the Association for Computational Linguistics},
volume = {7},
number = {},
pages = {551-566},
year = {2019},
doi = {10.1162/tacl\_a\_00282},
doi = {10.1162/tacl_a_00282},
URL = {https://doi.org/10.1162/tacl_a_00282},
eprint = {https://doi.org/10.1162/tacl_a_00282},
abstract = { Until now, grammatical error correction (GEC) has been primarily evaluated on text written by non-native English speakers, with a focus on student essays. This paper enables GEC development on text written by native speakers by providing a new data set and metric. We present a multiple-reference test corpus for GEC that includes 4,000 sentences in two new domains (formal and informal writing by native English speakers) and 2,000 sentences from a diverse set of non-native student writing. We also collect human judgments of several GEC systems on this new test set and perform a meta-evaluation, assessing how reliable automatic metrics are across these domains. We find that commonly used GEC metrics have inconsistent performance across domains, and therefore we propose a new ensemble metric that is robust on all three domains of text.}
}
```

## Repository Contents

### Data
`data/` contains the dev and test splits, with a subdirectory for each domain
containing
* the original sentences (`source`)
* system outputs (`amu`, `lstm`, `lstm-r`, `marian`, `nus`, `transformer`)
* human corrections (`ref[0-3]`)
* negative control used for collecting human ratings (`source+error`)

Domains are `fce`, `wiki`, and `yahoo`.

`DOMAIN-corpus-scores.csv` has the mean human rating for each system for that domain.
`DOMAIN-segment-scores.csv` has the mean human rating by sentence for each system.

Data from the `yahoo` domain was sampled from the Yahoo Answers corpus, created from [L6 - Yahoo! Answers Comprehensive Questions and Answers version 1.0](https://webscope.sandbox.yahoo.com/catalog.php?datatype=l). This Yahoo Answers corpus can be requested free of charge for research purposes. Access to data from the `yahoo` domain will require you to first gain access to this Yahoo Answers corpus.

Once you have gained access to the L6 corpus, please forward the acknowledgment to courtney.napoles@grammarly.com, along with your affiliation and a short description of how you will be using the data, and we will provide access to data from the `yahoo` domain.

### Metric

Coming soon. Please watch this repository or email courtney.napoles@grammarly.com
for updates.
9 changes: 9 additions & 0 deletions data/dev/fce-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,70.12675168300169
marian,76.65454039204046
nus,73.62093171468169
lstm,74.0639212701712
lstm-r,74.25294092169094
ref,83.53971450846448
source,67.58924425799425
transformer,73.67473826848831
969 changes: 969 additions & 0 deletions data/dev/fce-segment-scores.csv

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/amu

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/lstm

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/lstm-r

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/marian

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/nus

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/ref0

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/ref1

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/ref2

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/ref3

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/source

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/source+error

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/dev/fce/transformer

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions data/dev/wiki-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,76.1204142371234
marian,75.63849307580175
nus,75.83963496112736
lstm,77.64170007288625
lstm-r,78.20706997084547
ref,82.1782707725948
source,75.83908527696795
transformer,71.37706207482994
993 changes: 993 additions & 0 deletions data/dev/wiki-segment-scores.csv

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/amu

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/lstm

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/lstm-r

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/marian

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/nus

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/ref0

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/ref1

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/ref2

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/ref3

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/source

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/source+error

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/dev/wiki/transformer

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions data/dev/yahoo-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,74.45150188808789
marian,76.77618679809727
nus,75.86965634348488
lstm,78.57583308812713
lstm-r,78.20431011230447
ref,81.95218233534406
source,75.33571244666767
transformer,71.98429392869407
1,000 changes: 1,000 additions & 0 deletions data/dev/yahoo-segment-scores.csv

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions data/test/fce-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,71.2290628245067
marian,77.31783241853346
nus,74.25183856252788
lstm,74.53604528259906
lstm-r,74.70310228452755
ref,83.2790090491025
source,68.70614955743453
transformer,74.12012869010536
969 changes: 969 additions & 0 deletions data/test/fce-segment-scores.csv

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/amu

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/lstm

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/lstm-r

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/marian

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/nus

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/ref0

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/ref1

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/ref2

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/ref3

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/source

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/source+error

Large diffs are not rendered by default.

968 changes: 968 additions & 0 deletions data/test/fce/transformer

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions data/test/wiki-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,75.83009232264341
marian,75.95725461613222
nus,75.7170547862002
lstm,77.82579810495629
lstm-r,78.33034135082603
ref,82.00431000971824
source,75.94059402332367
transformer,71.68669096209915
993 changes: 993 additions & 0 deletions data/test/wiki-segment-scores.csv

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/amu

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/lstm

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/lstm-r

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/marian

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/nus

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/ref0

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/ref1

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/ref2

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/ref3

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/source

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/source+error

Large diffs are not rendered by default.

992 changes: 992 additions & 0 deletions data/test/wiki/transformer

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions data/test/yahoo-corpus-scores.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
system,score
amu,74.51883044584744
marian,76.43361718788238
nus,75.96793544756035
lstm,78.25074205941374
lstm-r,78.16818993784567
ref,82.5717656487056
source,74.78109981402636
transformer,72.41518438310574
1,001 changes: 1,001 additions & 0 deletions data/test/yahoo-segment-scores.csv

Large diffs are not rendered by default.

Binary file added tacl2019-enabling.pdf
Binary file not shown.

0 comments on commit c46d3aa

Please sign in to comment.