This repository includes the source code for the cross-lingual name tagging with multi-level adversarial training
Python3, Pytorch
-
Label format
The name tagger follows BIO or BIOES scheme:
-
Sentence format
Document is segmented into sentences. Each sentence is tokenized into multiple tokens.
In the training file, sentences are separated by an empty line. Tokens are separated by linebreak. For each token, label should be always at the end. Token and label are separated by space.
Example:
George B-PER W. I-PER Bush I-PER went O to O Germany B-GPE yesterday O . O New B-ORG York I-ORG Times I-ORG
A real example of a bio file:
example/data/eng.train.bio
Training example is provided in example/seq_labeling_naacl/
.
[1] Lifu Huang, Heng Ji, Jonathan May. Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging, Proc. NAACL, 2019