rnnmorph

Important: please see https://github.com/natasha/slovnet#morphology-1

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Contacts

Telegram: @YallenGusev

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain	Full tag	PoS tag	F.t. + lemma	Sentence f.t.	Sentence f.t.l.
Lenta (news)	96.31%	98.01%	92.96%	77.93%	52.79%
VK (social)	95.20%	98.04%	92.06%	74.30%	60.56%
JZ (lit.)	95.87%	98.71%	90.45%	73.10%	43.15%
All	95.81%	98.26%	N/A	74.92%	N/A

English language, UD EWT test, accuracy

Dataset	Full tag	PoS tag	F.t. + lemma	Sentence f.t.	Sentence f.t.l.
UD EWT test	91.57%	94.10%	87.02%	63.17%	50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

pip install rnnmorph

Usage

Example:

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training:

Acknowledgements

Anastasyev D. G., Gusev I. O., Indenbom E. M., 2018, Improving Part-of-speech Tagging Via Multi-task Learning and Character-level Word Representations
Anastasyev D. G., Andrianov A. I., Indenbom E. M., 2017, Part-of-speech Tagging with Rich Language Description, презентация
Дорожка по морфологическому анализу "Диалога-2017"
Материалы дорожки
Morphine by kmike, CRF classifier for MorphoRuEval-2017 by kmike
Universal Dependencies
Tobias Horsmann and Torsten Zesch, 2017, Do LSTMs really work so well for PoS tagging? – A replication study
Barbara Plank, Anders Søgaard, Yoav Goldberg, 2016, Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
rnnmorph		rnnmorph
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rnnmorph

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

English language, UD EWT test, accuracy

Speed and memory consumption

Install

Usage

Training

Acknowledgements

About

Releases

Packages

Contributors 5

Languages

License

IlyaGusev/rnnmorph

Folders and files

Latest commit

History

Repository files navigation

rnnmorph

Contacts

Russian language, MorphoRuEval-2017 test dataset, accuracy

English language, UD EWT test, accuracy

Speed and memory consumption

Install

Usage

Training

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages