Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Chinese NLP results #39

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Prev Previous commit
Next Next commit
update other resources
  • Loading branch information
yuanheTian authored Apr 5, 2022
commit ae23d81af653eee0b1b1c25cd53af224b6b4470d
2 changes: 1 addition & 1 deletion docs/word_embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer

| Name | Additional features | Training Corpus Size | Source |
| --- | --- | --- | --- |
| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. (2018)](https://aclanthology.org/N18-2028/) |
| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al. 2018](https://aclanthology.org/N18-2028/) |
| FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) |
| Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) |
| Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) |
Expand Down