Contextualized Word Representations from Distant Supervision with and for NER

NER with Wikipedia Distant Supervision Contextualized Embeddings This repository contains the source code for the NER system presented in the following research publication (link)

Abbas Ghaddar and Philippe Langlais 
Contextualized Word Representations from Distant Supervision with and for NER
In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

This code is based on the original bert implementation

Requirements

python 3.6
tensorflow>=1.13
pyhocon (for parsing the configurations)
fasttext==0.8.3

Prepare the Data

Follow instruction in /data in order to obtain the data, and change the path of data_dir in the experiments.config file.
Change the raw_path variables for conll and ontonotes datasets in experiments.config file to path/to/conll-2003 and path/to/conll-2012/v4/data respectively. For conll dataset please rename eng.train eng.testa eng.testb files to conll.train.txt conll.dev.txt conll.test.txt respectively. Also, change DATA_DIR in train_ner.sh and cache_emb.sh.
Run:

$ python preprocess.py {conll|ontonotes}
$ cd data
$ sh cache_emb.sh {conll|ontonotes}

Training

Once the data preprocessing is completed, you can train and test a model with:

$ cd data
$ sh train_ner.sh {conll|ontonotes}

Citation

Please cite the following paper when using our code:

@inproceedings{ghaddar2019contextualized,
  title={Contextualized Word Representations from Distant Supervision with and for NER},
  author={Ghaddar, Abbas and Langlais, Philippe},
  booktitle={Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)},
  pages={101--108},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cache_emb.py		cache_emb.py
experiments.conf		experiments.conf
modeling.py		modeling.py
ner_model.py		ner_model.py
optimization.py		optimization.py
preprocess.py		preprocess.py
tokenization.py		tokenization.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contextualized Word Representations from Distant Supervision with and for NER

Requirements

Prepare the Data

Training

Citation

About

Releases

Packages

Languages

License

ghaddarAbs/NER-with-wikiDSCE

Folders and files

Latest commit

History

Repository files navigation

Contextualized Word Representations from Distant Supervision with and for NER

Requirements

Prepare the Data

Training

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages