From 9d0f6d946f9bf3dd7205bd7a37ffa31a411d02ff Mon Sep 17 00:00:00 2001
From: m-yoshinaka <49668018+m-yoshinaka@users.noreply.github.com>
Date: Wed, 16 Sep 2020 08:45:51 +0000
Subject: [PATCH] [update] Update README.md - Add the instruction on how to use
with Docker. - Fix "Usage" section according to changes in other files.
---
README.md | 67 +++++++++++++++++++++++++++++++++++--------------------
1 file changed, 43 insertions(+), 24 deletions(-)
diff --git a/README.md b/README.md
index 6185fcc..062f3b8 100644
--- a/README.md
+++ b/README.md
@@ -3,6 +3,7 @@
**SAPPHIRE** is a simple monolingual phrase aligner based on word embeddings.
We explain the details of SAPPHIRE in the following paper.
+[[PDF]](https://www.aclweb.org/anthology/2020.lrec-1.847.pdf)
```
@inproceedings{yoshinaka-etal-2020,
author = {Yoshinaka, Masato and Kajiwara, Tomoyuki and Arase, Yuki},
@@ -19,8 +20,8 @@ We explain the details of SAPPHIRE in the following paper.
SAPPHIRE depends only on a pre-trained word embedding.
Therefore, it is easily transferable to specific domains and different languages.
-This library is designed for a pre-trained model of [fastText](https://fasttext.cc/).
-But it is easy to replace the model.
+This tool is designed for a pre-trained model of [fastText](https://fasttext.cc/).
+(Of course, it is easy to replace the word embedding.)
## Requirements
@@ -30,27 +31,37 @@ But it is easy to replace the model.
- fasttext
-## Installation (for fastText version)
+## Installation
-1. Install requirements
-After cloning this repository, go to the root directory and install requirements.
+1. Download the pre-trained model of fastText
+(or prepare your model of fastText) and move it to *model* directory.
```
-$ pip install -r requirements.txt
+$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip
+$ unzip wiki-news-300d-1M-subword.bin.zip
+$ mv wiki-news-300d-1M-subword.bin model/
```
-2. Install SAPPHIRE
-Installation with `develop` option allows you to change the parameters and add scripts for other word representations.
+### Docker
+1. Build the Docker image:
```
-$ python setup.py develop
+$ docker build -t sapphire .
+```
+2. Run a container:
+```
+$ docker run -it --rm -v ${PWD}/model:/work/model sapphire:latest /bin/bash
+# python
+>>> from sapphire import Sapphire
```
-
-3. Download the pre-trained model of fastText (or prepare your model of fastText) and move it to *model* directory.
+### Local installation
+1. Install requirements:
```
-$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip
-$ unzip wiki-news-300d-1M-subword.bin.zip
-$ mkdir model
-$ mv wiki-news-300d-1M-subword.bin model/
+$ pip install -r requirements.txt
+```
+2. Install SAPPHIRE using `develop` option
+(that allows you to add scripts for other word representations):
+```
+$ python setup.py develop
```
@@ -60,20 +71,28 @@ $ mv wiki-news-300d-1M-subword.bin model/
```
$ python run_sapphire.py model/wiki-news-300d-1M-subword.bin
```
-To stop SAPPHIRE, enter `EXIT` when inputting a sentence.
+To stop SAPPHIRE, enter `Ctrl-C` when inputting a sentence.
### Usage of the SAPPHIRE module
```
+>>> import fasttext
>>> from sapphire import Sapphire
->>> aligner = Sapphire()
+>>> model = fasttext.FastText.load_model(path_to_your_model)
+>>> aligner = Sapphire(model)
+```
+If you change the hyper-parameters,
```
-After preparing a **tokenized** sentence pair (`tokenized_sentence_a: list` and `tokenized_sentence_b: list`),
+>>> aligner.set_params(lambda_=0.6, delta=0.6, alpha=0.01, hungarian=False)
```
->>> result = aligner.align(tokenized_sentence_a, tokenized_sentence_b)
->>> alignment = result.top_alignment[0][0]
->>> print(alignment)
+After preparing a **tokenized** sentence pair
+(`tokenized_sentence_a: list` and `tokenized_sentence_b: list`),
+```
+>>> _, alignment = aligner.align(tokenized_sentence_a, tokenized_sentence_b)
+>>> alignment
[(1, 3, 2, 3), (8, 9, 5, 6), (13, 13, 8, 8), (27, 27, 9, 9)]
```
-phrase pair :
-
- \# 1-indexed alignment
+
+- Phrase pair
+is represented as
+.
+- Outputs of SAPPHIRE are 1-indexed alignments.