Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Chinese NLP results #39

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Prev Previous commit
Next Next commit
Add new results for word embeddings
  • Loading branch information
yuanheTian authored Apr 6, 2022
commit 35facb6e1a88b41d8ebe721649e70b19cd4abd05
4 changes: 3 additions & 1 deletion docs/word_embedding.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,9 @@ See e.g. [Torregrossa et al., 2020](https://www.aclweb.org/anthology/2020.lrec-1
| System | wordsim-240 (⍴) | wordsim-296 (⍴) |
| --- | --- | --- |
| [Sun et. al. (2019)](https://arxiv.org/pdf/1902.08795.pdf) (VCWE) | 57.81 | 61.29 |
| [Song et. al. (2018)](https://www.ijcai.org/Proceedings/2018/0608.pdf) | 54.14 | 57.04 |
| [Yu et. al. (2017)](https://www.aclweb.org/anthology/D17-1027) (JWE) | 51.92 | 59.84 |
| Baseline (CBOW) | 51.01 | 53.65 |



Expand Down Expand Up @@ -146,7 +148,7 @@ Given “France : Paris :: China : ?”, a system should come up with the answer

| Name | Additional features | Training Corpus Size | Source |
| --- | --- | --- | --- |
| [Tencent Embedding](https://ai.tencent.com/ailab/nlp/en/embedding.html) | 8M Chinese words, 200 dimension | | [Song et al., 2018](https://aclanthology.org/N18-2028/) |
| DSG | Leverage directional information to improve skip-gram algorithm | | [Song et al., 2018](https://aclanthology.org/N18-2028/) |
| FastText | - | 374M characters | [Grave et al., 2018](https://arxiv.org/pdf/1802.06893.pdf) |
| Mimick | Interpolate between similar characters to improve rare words, multilingual | | [Pinter et al., 2017](https://www.aclweb.org/anthology/D17-1010.pdf) |
| Glyph2vec | Uses character bitmaps, canjie to address OOV problem | 10M chars | [Chen et al., 2020](https://www.aclweb.org/anthology/2020.acl-main.256.pdf) |
Expand Down