Skip to content

Commit

Permalink
[update] Update README.md
Browse files Browse the repository at this point in the history
- Add the instruction on how to use with Docker.
- Fix "Usage" section according to changes in other files.
  • Loading branch information
m-yoshinaka committed Sep 16, 2020
1 parent c0716b1 commit 9d0f6d9
Showing 1 changed file with 43 additions and 24 deletions.
67 changes: 43 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
**SAPPHIRE** is a simple monolingual phrase aligner based on word embeddings.

We explain the details of SAPPHIRE in the following paper.
[[PDF]](https://www.aclweb.org/anthology/2020.lrec-1.847.pdf)
```
@inproceedings{yoshinaka-etal-2020,
author = {Yoshinaka, Masato and Kajiwara, Tomoyuki and Arase, Yuki},
Expand All @@ -19,8 +20,8 @@ We explain the details of SAPPHIRE in the following paper.

SAPPHIRE depends only on a pre-trained word embedding.
Therefore, it is easily transferable to specific domains and different languages.
This library is designed for a pre-trained model of [fastText](https://fasttext.cc/).
But it is easy to replace the model.
This tool is designed for a pre-trained model of [fastText](https://fasttext.cc/).
(Of course, it is easy to replace the word embedding.)


## Requirements
Expand All @@ -30,27 +31,37 @@ But it is easy to replace the model.
- fasttext


## Installation (for fastText version)
## Installation

1. Install requirements
After cloning this repository, go to the root directory and install requirements.
1. Download the pre-trained model of fastText
(or prepare your model of fastText) and move it to *model* directory.
```
$ pip install -r requirements.txt
$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip
$ unzip wiki-news-300d-1M-subword.bin.zip
$ mv wiki-news-300d-1M-subword.bin model/
```

2. Install SAPPHIRE
Installation with `develop` option allows you to change the parameters and add scripts for other word representations.
### Docker
1. Build the Docker image:
```
$ python setup.py develop
$ docker build -t sapphire .
```
2. Run a container:
```
$ docker run -it --rm -v ${PWD}/model:/work/model sapphire:latest /bin/bash
# python
>>> from sapphire import Sapphire
```


3. Download the pre-trained model of fastText (or prepare your model of fastText) and move it to *model* directory.
### Local installation
1. Install requirements:
```
$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip
$ unzip wiki-news-300d-1M-subword.bin.zip
$ mkdir model
$ mv wiki-news-300d-1M-subword.bin model/
$ pip install -r requirements.txt
```
2. Install SAPPHIRE using `develop` option
(that allows you to add scripts for other word representations):
```
$ python setup.py develop
```


Expand All @@ -60,20 +71,28 @@ $ mv wiki-news-300d-1M-subword.bin model/
```
$ python run_sapphire.py model/wiki-news-300d-1M-subword.bin
```
To stop SAPPHIRE, enter `EXIT` when inputting a sentence.
To stop SAPPHIRE, enter `Ctrl-C` when inputting a sentence.

### Usage of the SAPPHIRE module
```
>>> import fasttext
>>> from sapphire import Sapphire
>>> aligner = Sapphire()
>>> model = fasttext.FastText.load_model(path_to_your_model)
>>> aligner = Sapphire(model)
```
If you change the hyper-parameters,
```
After preparing a **tokenized** sentence pair (`tokenized_sentence_a: list` and `tokenized_sentence_b: list`),
>>> aligner.set_params(lambda_=0.6, delta=0.6, alpha=0.01, hungarian=False)
```
>>> result = aligner.align(tokenized_sentence_a, tokenized_sentence_b)
>>> alignment = result.top_alignment[0][0]
>>> print(alignment)
After preparing a **tokenized** sentence pair
(`tokenized_sentence_a: list` and `tokenized_sentence_b: list`),
```
>>> _, alignment = aligner.align(tokenized_sentence_a, tokenized_sentence_b)
>>> alignment
[(1, 3, 2, 3), (8, 9, 5, 6), (13, 13, 8, 8), (27, 27, 9, 9)]
```
phrase pair <img src="https://render.githubusercontent.com/render/math?math={(x, y)}"> :
<img src="https://render.githubusercontent.com/render/math?math={(x_\text{start}, x_\text{end}, y_\text{start}, y_\text{end})}">
\# 1-indexed alignment

- Phrase pair <img src="https://render.githubusercontent.com/render/math?math={(x,y)}">
is represented as
<img src="https://render.githubusercontent.com/render/math?math={(x_\text{start},x_\text{end},y_\text{start},y_\text{end})}">.
- Outputs of SAPPHIRE are 1-indexed alignments.

0 comments on commit 9d0f6d9

Please sign in to comment.