If you are interested in our work, please cite
@inproceedings{zhang-etal-2023-ctc,
title = {Non-autoregressive Text Editing with Copy-aware Latent Alignments},
author = {Zhang, Yu and
Zhang, Yue and
Cui, Leyang and
Fu, Guohong},
booktitle = {Proceedings of EMNLP},
year = {2023},
address = {Singapore}
}
The following packages should be installed:
PyTorch
: >= 2.0Transformers
Errant
Clone this repo recursively:
git clone https://github.com/yzhangcs/ctc-copy.git --recursive
You can follow this repo to obtain the 3-stage train/dev/test data for training a English GEC model. The multilingual datasets are available here.
Before running, you are required to preprocess each sentence pair into the format of SRC:\t[src]\nTGT:\t[tgt]\n
, where src
and tgt
are the source and target sentences, respectively. Each sentence pair is separated by a blank line.
See data/clang8.toy
for examples.
Try the following command to train a 3-stage English model,
bash train.sh
To make predictions & evaluations:
bash pred.sh
If you have any questions, please feel free to email me.