From 9d0f6d946f9bf3dd7205bd7a37ffa31a411d02ff Mon Sep 17 00:00:00 2001 From: m-yoshinaka <49668018+m-yoshinaka@users.noreply.github.com> Date: Wed, 16 Sep 2020 08:45:51 +0000 Subject: [PATCH] [update] Update README.md - Add the instruction on how to use with Docker. - Fix "Usage" section according to changes in other files. --- README.md | 67 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index 6185fcc..062f3b8 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,7 @@ **SAPPHIRE** is a simple monolingual phrase aligner based on word embeddings. We explain the details of SAPPHIRE in the following paper. +[[PDF]](https://www.aclweb.org/anthology/2020.lrec-1.847.pdf) ``` @inproceedings{yoshinaka-etal-2020, author = {Yoshinaka, Masato and Kajiwara, Tomoyuki and Arase, Yuki}, @@ -19,8 +20,8 @@ We explain the details of SAPPHIRE in the following paper. SAPPHIRE depends only on a pre-trained word embedding. Therefore, it is easily transferable to specific domains and different languages. -This library is designed for a pre-trained model of [fastText](https://fasttext.cc/). -But it is easy to replace the model. +This tool is designed for a pre-trained model of [fastText](https://fasttext.cc/). +(Of course, it is easy to replace the word embedding.) ## Requirements @@ -30,27 +31,37 @@ But it is easy to replace the model. - fasttext -## Installation (for fastText version) +## Installation -1. Install requirements -After cloning this repository, go to the root directory and install requirements. +1. Download the pre-trained model of fastText +(or prepare your model of fastText) and move it to *model* directory. ``` -$ pip install -r requirements.txt +$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip +$ unzip wiki-news-300d-1M-subword.bin.zip +$ mv wiki-news-300d-1M-subword.bin model/ ``` -2. Install SAPPHIRE -Installation with `develop` option allows you to change the parameters and add scripts for other word representations. +### Docker +1. Build the Docker image: ``` -$ python setup.py develop +$ docker build -t sapphire . +``` +2. Run a container: +``` +$ docker run -it --rm -v ${PWD}/model:/work/model sapphire:latest /bin/bash +# python +>>> from sapphire import Sapphire ``` - -3. Download the pre-trained model of fastText (or prepare your model of fastText) and move it to *model* directory. +### Local installation +1. Install requirements: ``` -$ curl -O https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip -$ unzip wiki-news-300d-1M-subword.bin.zip -$ mkdir model -$ mv wiki-news-300d-1M-subword.bin model/ +$ pip install -r requirements.txt +``` +2. Install SAPPHIRE using `develop` option +(that allows you to add scripts for other word representations): +``` +$ python setup.py develop ``` @@ -60,20 +71,28 @@ $ mv wiki-news-300d-1M-subword.bin model/ ``` $ python run_sapphire.py model/wiki-news-300d-1M-subword.bin ``` -To stop SAPPHIRE, enter `EXIT` when inputting a sentence. +To stop SAPPHIRE, enter `Ctrl-C` when inputting a sentence. ### Usage of the SAPPHIRE module ``` +>>> import fasttext >>> from sapphire import Sapphire ->>> aligner = Sapphire() +>>> model = fasttext.FastText.load_model(path_to_your_model) +>>> aligner = Sapphire(model) +``` +If you change the hyper-parameters, ``` -After preparing a **tokenized** sentence pair (`tokenized_sentence_a: list` and `tokenized_sentence_b: list`), +>>> aligner.set_params(lambda_=0.6, delta=0.6, alpha=0.01, hungarian=False) ``` ->>> result = aligner.align(tokenized_sentence_a, tokenized_sentence_b) ->>> alignment = result.top_alignment[0][0] ->>> print(alignment) +After preparing a **tokenized** sentence pair +(`tokenized_sentence_a: list` and `tokenized_sentence_b: list`), +``` +>>> _, alignment = aligner.align(tokenized_sentence_a, tokenized_sentence_b) +>>> alignment [(1, 3, 2, 3), (8, 9, 5, 6), (13, 13, 8, 8), (27, 27, 9, 9)] ``` -phrase pair : - - \# 1-indexed alignment + +- Phrase pair +is represented as +. +- Outputs of SAPPHIRE are 1-indexed alignments.