Improved grammatical error correction by ranking elementary edits

Code for EMNLP2022 paper Improved grammatical error correction by ranking elementary edits that provides a state-of-the-art approach to grammatical error correction.

Installation

pip install -r requirements.txt
(optional) Install ERRANT for evaluation.

Data

Download W&I-LOCNESS data

mkdir -p data
cd data && wget https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness_v2.1.bea19.tar.gz
tar -xzvf wi+locness_v2.1.bea19.tar.gz
cd ..

(To reproduce finetuning and evaluation) Download edits generated by GECToR model

cd data
mkdir -p bea_reranking && cd bea_reranking
wget https://www.dropbox.com/s/m5dot9rp0vwkcc8/gector_variants.tar.gz
tar -xzvf gector_variants.tar.gz
cd ../..

(To reproduce finetuning and evaluation) Download model checkpoints.

Checkpoint folder	Language	Best F1	Model	Threshold	Basic model weight
pie_bea-gector	English	56.05:star:	roberta-base	0.8	0.1
pie_bea_ft2-gector	English	57.51:star:	roberta-large	0.8	0.1
clang_large_ft2-gector	English	58.94:star:	roberta-large	0.8	0.1
ru_200K_gpt	Russian	53.44:heavy_check_mark:	sberbank-ai/ruRoberta-large	0.7	0.1
ru_200K_gpt_ft1	Russian	55.04:heavy_check_mark:	sberbank-ai/ruRoberta-large	0.8	0.1

⭐ On BEA-2019 development set ✔️ On RULEC-GEC test set

Russian data

To obtain RULEC-GEC data follow the instructions in RULEC-GEC repository. The zip archive with edits is available via the link, the password is the correction for the first error in its training data.

Model evaluation and application

Variants generation

English, GECToR: see our modification of GECToR repository.
English, BERT-GEC: run beam search with large beam size (e.g., 15) using their code and then postprocess the output with

python bertgec/output_to_json.py -i BERT_GEC_OUTPUT_FOLDER/test.nbest.tok -o OUTPUT.jsonl
python bertgec/process_bert_gec_outputs.py -i OUTPUT.jsonl -s INPUT_FILE -o OUTPUT.variants -t -3.0 -j

In case the data is simply the list of tokenized sentences, append -r option to the last command.

Russian: uses a modification of a GPT-like model, TO APPEAR SOON.

You may use your own generator if it produces the file in the appropriate format (use the provided GECToR edits as reference).

Variants reranking

For each generated edit, our model returns its probability to be correct and applies the edits whose probabilities are higher than the given threshold. We recommend to use 0.8 or 0.9 threshold by default or tune it on development set.

# Faster simultaneous decoding (see the paper)
python apply_model.py -c CHECKPOINT_FOLDER -C CHECKPOINT_NAME -v TEST_VARIANTS_PATH
-O OUTPUT_FOLDER --n_max 8 [-m MODEL_NAME; DEFAULT=roberta-base] [-T THRESHOLDS ...; DEFAULT=0.4 0.5 0.6 0.7 0.8 0.9] [-a BASIC_MODEL_WEIGHTS ...] [-r] 
# Better stagewise decoding (see the paper)
python apply_staged_model.py -c CHECKPOINT_FOLDER -C CHECKPOINT_NAME -v TEST_VARIANTS_PATH
-O OUTPUT_FOLDER -s 8 [-m MODEL_NAME, DEFAULT=roberta-base] [-T THRESHOLDS ..., DEFAULT=0.7 0.8 0.9] [-a BASIC_MODEL_WEIGHTS ...] [-r]

Add -r key when variants were obtained from unlabeled data and correct answers are not known.

For example, to make the predictions on development set using checkpoints/pie_bea_ft2-gector/checkpoint_2.pt checkpoint with stagewise decoding and evaluate them for threshold=0.9, run

python apply_staged_model.py -c checkpoints/pie_bea_ft2-gector -C checkpoint_2.pt \
-i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -v data/bea_reranking/gector_variants/bea.dev.variants \
-O dump/reranking -s 8 -a 0.1
./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/pie_bea_ft2-gector/0.9_staged.output

It should produce

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2250	903	5211	0.7136	0.3016	0.5605
==============================================

The combined model output for threshold 0.8 is evaluated by

./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/pie_bea_ft2-gector/0.8_alpha=0.10_1.00_staged.output

and produces

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2567	1147	4894	0.6912	0.3441	0.5751
==============================================

A larger checkpoints/clang_large_ft2-gector/checkpoint_2.pt checkpoint is used analogously

python apply_staged_model.py -c checkpoints/clang_large_ft2-gector -C checkpoint_2.pt \
-i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -v data/bea_reranking/gector_variants/bea.dev.variants \
-O dump/reranking -m roberta-large -s 8 -a 0.1
./scripts/evaluate.sh -i data/wi+locness/m2/ABCN.dev.gold.bea19.m2 -r dump/reranking/clang_large_ft2-gector/0.8_alpha=0.10_1.00_staged.output

=========== Span-Based Correction ============
TP	FP	FN	Prec	Rec	F0.5
2678	1136	4783	0.7021	0.3589	0.5894
==============================================

To generate the outputs on the test set, run

python apply_staged_model.py -c checkpoints/clang_large_ft2-gector -C checkpoint_2.pt \
-v data/wi+locness/test/ABCN.test.bea19.orig -O dump/test_output -s 8 -a 0.1 -r

The *.output files for different threshold values are available in OUTPUT_FOLDER (dump/test_output in our case).

Russian

The only difference for Russian is that we use M2Scorer to do evaluation

python apply_staged_model.py -c checkpoints/ru_200K_gpt_ft1 -C checkpoint_2.pt -O dump/reranking \
 -v data/russian_reranking/gpt/test.variants -i data/russian/RULEC-GEC.test.M2 -m sberbank-ai/ruRoberta-large \
 -s 5 -a 0.1

python scripts/m2scorer/scripts/m2scorer.py dump/reranking/ru_200K_gpt_ft1/0.7_alpha\=0.10_1.00_staged.output data/russian/RULEC-GEC.test.M2

Precision   : 0.7367
Recall      : 0.2733
F_0.5       : 0.5502

Model training

python train.py TRAIN_VARIANTS_PATH -T TEST_VARIANTS_PATH -M 768 --loss_by_class -e EPOCHS
-c CHECKPOINT_FOLDER [-L INITIAL_CHECKPOINT_PATH] [-E RECALL_ESTIMATE] [-m MODEL_NAME; DEFAULT=roberta-base] --save_all_checkpoints
--only_generated

English, finetuning on W&I-LOCNESS train set using GECToR-generated edits:

python train.py -t data/bea_reranking/gector_variants/bea.train.variants -T \
data/bea_reranking/gector_variants/bea.dev.variants -M 768 --loss_by_class -e 3 \ 
-c checkpoints/pie_bea_ft2_rerun-gector -L checkpoints/pie_bea-gector/checkpoint_2.pt \
-E 0.4 --save_all_checkpoints --only_generated

Russian, finetuning on RULEC-GEC data

python train.py -t data/russian_reranking/gpt/train.variants -T data/russian_reranking/gpt/dev.variants \
 -M 768 --loss_by_class -e 5 -c checkpoints/ru_200K_gpt_ft1 -L checkpoints/ru_200K_gpt/checkpoint_1.pt \ 
 -E 0.4 --save_all_checkpoints -m sberbank-ai/ruRoberta-large --only_generated

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bertgec		bertgec
common		common
errant_wrapper		errant_wrapper
ranking		ranking
scripts		scripts
utils		utils
__init__.py		__init__.py
apply_model.py		apply_model.py
apply_staged_model.py		apply_staged_model.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improved grammatical error correction by ranking elementary edits

Installation

Data

Russian data

Model evaluation and application

Variants generation

Variants reranking

Russian

Model training

About

Releases

Packages

Languages

AlexeySorokin/EditScorer

Folders and files

Latest commit

History

Repository files navigation

Improved grammatical error correction by ranking elementary edits

Installation

Data

Russian data

Model evaluation and application

Variants generation

Variants reranking

Russian

Model training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages