LexiconAugmentedNER

This is the implementation of our arxiv paper "Simplify the Usage of Lexicon in Chinese NER", which rejects complicated operations for incorporating word lexicon in Chinese NER. We show that incorporating lexicon in Chinese NER can be quite simple and, at the same time, effective.

Source code description

Requirement:

====== Python 3.6 Pytorch 0.4.1

Input format:

====== CoNLL format, with each character and its label splited by a whitespace in a line. The "BMES" tag scheme is prefered.

别 O 错 O 过 O 邻 O 近 O 大 B-LOC 鹏 M-LOC 湾 E-LOC 的 O 湿 O 地 O

Pretrain embedding:

====== The pretrained embeddings(word embedding, char embedding and bichar embedding) are the same with Lattice LSTM(https://github.com/jiesutd/LatticeLSTM)

Run the code:

======

Download the character embeddings and word embeddings and put them in the data folder.
To train/test the demo on OntoNotes: sh train.sh / sh test.sh
To train/test the demo on the other three datasets: change the learning rate and lstm hidden dimension according to the paper and run sh train.sh / sh test.sh
To train/test your own data: modify the 'train.sh' or 'test.sh' file with your file path, and run the shell file.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
model		model
save_model		save_model
utils		utils
README.md		README.md
main.py		main.py
test.sh		test.sh
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexiconAugmentedNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Releases

Packages

Contributors 2

Languages

v-mipeng/LexiconAugmentedNER

Folders and files

Latest commit

History

Repository files navigation

LexiconAugmentedNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages