Releases
v1.2.0
v1.2.0: Alpha tokenizers for Chinese, French, Spanish, Italian and Portuguese
✨ Major features and improvements
NEW: Support Chinese tokenization, via Jieba .
NEW: Alpha support for French, Spanish, Italian and Portuguese tokenization.
🔴 Bug fixes
Fix issue #376 : POS tags for "and/or" are now correct.
Fix issue #578 : --force
argument on download command now operates correctly.
Fix issue #595 : Lemmatization corrected for some base forms.
Fix issue #588 : Matcher
now rejects empty patterns.
Fix issue #592 : Added exception rule for tokenization of "Ph.D."
Fix issue #599 : Empty documents now considered tagged and parsed.
Fix issue #600 : Add missing token.tag
and token.tag_
setters.
Fix issue #596 : Added missing unicode import when compiling regexes that led to incorrect tokenization.
Fix issue #587 : Resolved bug that caused Matcher
to sometimes segfault.
Fix issue #429 : Ensure missing entity types are added to the entity recognizer.
You can’t perform that action at this time.