renaming stanfordnlp to stanza!!

SajanShetty · Mar 6, 2020 · 451a923 · 451a923
1 parent 289f148
commit 451a923
Show file tree

Hide file tree

Showing 101 changed files with 416 additions and 416 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -10,12 +10,12 @@ install:
   - wget $CORENLP_URL -O corenlp.zip
   - unzip corenlp.zip
   - mv $CORENLP_VERSION $CORENLP_HOME
-  - mkdir ~/stanfordnlp_test
-  - mkdir ~/stanfordnlp_test/in
-  - mkdir ~/stanfordnlp_test/out
-  - mkdir ~/stanfordnlp_test/scripts
-  - cp tests/data/external_server.properties ~/stanfordnlp_test/scripts
-  - cp tests/data/example_french.json ~/stanfordnlp_test/out
-  - export STANFORDNLP_TEST_HOME=~/stanfordnlp_test
+  - mkdir ~/stanza_test
+  - mkdir ~/stanza_test/in
+  - mkdir ~/stanza_test/out
+  - mkdir ~/stanza_test/scripts
+  - cp tests/data/external_server.properties ~/stanza_test/scripts
+  - cp tests/data/example_french.json ~/stanza_test/out
+  - export STANZA_TEST_HOME=~/stanza_test
 script:
   - python -m pytest -m travis tests/
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
-# StanfordNLP: A Python NLP Library for Many Human Languages
+# Stanza: A Python NLP Library for Many Human Languages
 
 [![Travis Status](https://travis-ci.com/stanfordnlp/stanfordnlp.svg?token=RPNzRzNDQRoq2x3J2juj&branch=master)](https://travis-ci.com/stanfordnlp/stanfordnlp)
 [![PyPI Version](https://img.shields.io/pypi/v/stanfordnlp.svg?colorB=blue)](https://pypi.org/project/stanfordnlp/)
 ![Python Versions](https://img.shields.io/pypi/pyversions/stanfordnlp.svg?colorB=blue)
 
-The Stanford NLP Group's official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server. For detailed information please visit our [official website](https://stanfordnlp.github.io/stanfordnlp/).
+The Stanford NLP Group's official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server. For detailed information please visit our [official website](https://stanfordnlp.github.io/stanza/).
 
 ### References
 
@@ -31,45 +31,45 @@ If you use the CoreNLP server, please cite the CoreNLP software package and the
 
 ## Issues and Usage Q&A
 
-To ask questions, report issues or request features, please use the [GitHub Issue Tracker](https://github.com/stanfordnlp/stanfordnlp/issues).
+To ask questions, report issues or request features, please use the [GitHub Issue Tracker](https://github.com/stanfordnlp/stanza/issues).
 
 ## Setup
 
-StanfordNLP supports Python 3.6 or later. We strongly recommend that you install StanfordNLP from PyPI. If you already have [pip installed](https://pip.pypa.io/en/stable/installing/), simply run:
+Stanza supports Python 3.6 or later. We strongly recommend that you install Stanza from PyPI. If you already have [pip installed](https://pip.pypa.io/en/stable/installing/), simply run:
 ```bash
-pip install stanfordnlp
+pip install stanza
 ```
-this should also help resolve all of the dependencies of StanfordNLP, for instance [PyTorch](https://pytorch.org/) 1.0.0 or above.
+this should also help resolve all of the dependencies of Stanza, for instance [PyTorch](https://pytorch.org/) 1.0.0 or above.
 
-If you currently have a previous version of `stanfordnlp` installed, use:
+If you currently have a previous version of `stanza` installed, use:
 ```bash
-pip install stanfordnlp -U
+pip install stanza -U
 ```
 
-Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of StanfordNLP and training your own models. For this option, run
+Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of Stanza and training your own models. For this option, run
 ```bash
-git clone https://github.com/stanfordnlp/stanfordnlp.git
-cd stanfordnlp
+git clone https://github.com/stanfordnlp/stanza.git
+cd stanza
 pip install -e .
 ```
 
-## Running StanfordNLP
+## Running Stanza
 
 ### Getting Started with the neural pipeline
 
-To run your first StanfordNLP pipeline, simply following these steps in your Python interactive interpreter:
+To run your first Stanza pipeline, simply following these steps in your Python interactive interpreter:
 
 ```python
->>> import stanfordnlp
->>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
+>>> import stanza
+>>> stanza.download('en')   # This downloads the English models for the neural pipeline
 # IMPORTANT: The above line prompts you before downloading, which doesn't work well in a Jupyter notebook.
-# To avoid a prompt when using notebooks, instead use: >>> stanfordnlp.download('en', force=True)
->>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
+# To avoid a prompt when using notebooks, instead use: >>> stanza.download('en', force=True)
+>>> nlp = stanza.Pipeline() # This sets up a default neural pipeline in English
 >>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
 >>> doc.sentences[0].print_dependencies()
 ```
 
-The last command will print out the words in the first sentence in the input string (or `Document`, as it is represented in StanfordNLP), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its "head"), along with the dependency relation between the words. The output should look like:
+The last command will print out the words in the first sentence in the input string (or `Document`, as it is represented in Stanza), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its "head"), along with the dependency relation between the words. The output should look like:
 
 ```
 ('Barack', '4', 'nsubj:pass')
@@ -83,13 +83,13 @@ The last command will print out the words in the first sentence in the input str
 
 **Note:** If you are running into issues like `OSError: [Errno 22] Invalid argument`, it's very likely that you are affected by a [known Python issue](https://bugs.python.org/issue24658), and we would recommend Python 3.6.8 or later and Python 3.7.2 or later.
 
-We also provide a multilingual [demo script](https://github.com/stanfordnlp/stanfordnlp/blob/master/demo/pipeline_demo.py) that demonstrates how one uses StanfordNLP in other languages than English, for example Chinese (traditional)
+We also provide a multilingual [demo script](https://github.com/stanfordnlp/stanza/blob/master/demo/pipeline_demo.py) that demonstrates how one uses Stanza in other languages than English, for example Chinese (traditional)
 
 ```bash
 python demo/pipeline_demo.py -l zh
 ```
 
-See [our getting started guide](https://stanfordnlp.github.io/stanfordnlp/installation_usage.html#getting-started) for more details.
+See [our getting started guide](https://stanfordnlp.github.io/stanza/installation_usage.html#getting-started) for more details.
 
 ### Access to Java Stanford CoreNLP Server
 
@@ -101,7 +101,7 @@ There are a few initial setup steps.
 * Put the model jars in the distribution folder
 * Tell the python code where Stanford CoreNLP is located: `export CORENLP_HOME=/path/to/stanford-corenlp-full-2018-10-05`
 
-We provide another [demo script](https://github.com/stanfordnlp/stanfordnlp/blob/master/demo/corenlp.py) that shows how one can use the CoreNLP client and extract various annotations from it.
+We provide another [demo script](https://github.com/stanfordnlp/stanza/blob/master/demo/corenlp.py) that shows how one can use the CoreNLP client and extract various annotations from it.
 
 ### Online Colab Notebooks
 
@@ -110,11 +110,11 @@ To get your started, we also provide interactive Jupyter notebooks in the `demo`
 * Go to the [Google Colab website](https://colab.research.google.com)
 * Navigate to `File` -> `Open notebook`, and choose `GitHub` in the pop-up menu
 * Note that you do **not** need to give Colab access permission to your github account
-* Type `stanfordnlp/stanfordnlp` in the search bar, and click enter
+* Type `stanfordnlp/stanza` in the search bar, and click enter
 
 ### Trained Models for the Neural Pipeline
 
-We currently provide models for all of the treebanks in the CoNLL 2018 Shared Task. You can find instructions for downloading and using these models [here](https://stanfordnlp.github.io/stanfordnlp/models.html).
+We currently provide models for all of the treebanks in the CoNLL 2018 Shared Task. You can find instructions for downloading and using these models [here](https://stanfordnlp.github.io/stanza/models.html).
 
 ### Batching To Maximize Pipeline Speed
 
@@ -127,8 +127,8 @@ We are actively working on improving multi-document processing.
 
 All neural modules in this library, including the tokenizer, the multi-word token (MWT) expander, the POS/morphological features tagger, the lemmatizer and the dependency parser, can be trained with your own [CoNLL-U](https://universaldependencies.org/format.html) format data. Currently, we do not support model training via the `Pipeline` interface. Therefore, to train your own models, you need to clone this git repository and set up from source.
 
-For detailed step-by-step guidance on how to train and evaluate your own models, please visit our [training documentation](https://stanfordnlp.github.io/stanfordnlp/training.html).
+For detailed step-by-step guidance on how to train and evaluate your own models, please visit our [training documentation](https://stanfordnlp.github.io/stanza/training.html).
 
 ## LICENSE
 
-StanfordNLP is released under the Apache License, Version 2.0. See the [LICENSE](https://github.com/stanfordnlp/stanfordnlp/blob/master/LICENSE) file for more details.
+Stanza is released under the Apache License, Version 2.0. See the [LICENSE](https://github.com/stanfordnlp/stanza/blob/master/LICENSE) file for more details.
diff --git a/demo/corenlp.py b/demo/corenlp.py
@@ -1,4 +1,4 @@
-from stanfordnlp.server import CoreNLPClient
+from stanza.server import CoreNLPClient
 
 # example text
 print('---')

diff --git a/demo/pipeline_demo.py b/demo/pipeline_demo.py
@@ -6,14 +6,14 @@
 import argparse
 import os
 
-import stanfordnlp
-from stanfordnlp.utils.resources import DEFAULT_MODEL_DIR
+import stanza
+from stanza.utils.resources import DEFAULT_MODEL_DIR
 
 
 if __name__ == '__main__':
     # get arguments
     parser = argparse.ArgumentParser()
-    parser.add_argument('-d', '--models_dir', help='location of models files | default: ~/stanfordnlp_resources',
+    parser.add_argument('-d', '--models_dir', help='location of models files | default: ~/stanza_resources',
                         default=DEFAULT_MODEL_DIR)
     parser.add_argument('-l', '--lang', help='Demo language',
                         default="en")
@@ -30,11 +30,11 @@
         sys.exit(1)
 
     # download the models
-    stanfordnlp.download(args.lang, args.models_dir, confirm_if_exists=True)
+    stanza.download(args.lang, args.models_dir, confirm_if_exists=True)
     # set up a pipeline
     print('---')
     print('Building pipeline...')
-    pipeline = stanfordnlp.Pipeline(models_dir=args.models_dir, lang=args.lang, use_gpu=(not args.cpu))
+    pipeline = stanza.Pipeline(models_dir=args.models_dir, lang=args.lang, use_gpu=(not args.cpu))
     # process the document
     doc = pipeline(example_sentences[args.lang])
     # access nlp annotations

diff --git a/scripts/config.sh b/scripts/config.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 #
-# Set environment variables for the training and testing of stanfordnlp modules.
+# Set environment variables for the training and testing of stanza modules.
 
 # Set UDBASE to the location of UD data folder
 # The data should be CoNLL-U format

diff --git a/scripts/lang2code.py b/scripts/lang2code.py
@@ -3,7 +3,7 @@
 """
 import sys
 
-from stanfordnlp.models.common.constant import lang2lcode
+from stanza.models.common.constant import lang2lcode
 
 if len(sys.argv) <= 1:
     raise Exception("Language name not provided.")

diff --git a/scripts/prep_depparse_data.sh b/scripts/prep_depparse_data.sh
@@ -48,15 +48,15 @@ elif [ $tag_type == 'predicted' ]; then
     # run part-of-speech tagging on the train file
     echo '---'
     echo 'running part of speech model to generate predicted tags for train data'
-    train_cmd='python -m stanfordnlp.models.tagger --wordvec_dir '${WORDVEC_DIR}' --eval_file '${gold_train_file}' --gold_file '${gold_train_file}' --output_file '${train_in_file}' --lang '${original_short}' --shorthand '${original_short}' --batch_size '${batch_size}' --mode predict'
+    train_cmd='python -m stanza.models.tagger --wordvec_dir '${WORDVEC_DIR}' --eval_file '${gold_train_file}' --gold_file '${gold_train_file}' --output_file '${train_in_file}' --lang '${original_short}' --shorthand '${original_short}' --batch_size '${batch_size}' --mode predict'
     echo ''
     echo $train_cmd
     echo ''
     eval $train_cmd
     # run part-of-speech tagging on the train file
     echo '---'
     echo 'running part of speech model to generate predicted tags for dev data'
-    dev_cmd='python -m stanfordnlp.models.tagger --wordvec_dir '${WORDVEC_DIR}' --eval_file '${gold_dev_file}' --gold_file '${gold_dev_file}' --output_file '${dev_in_file}' --lang '${original_short}' --shorthand '${original_short}' --batch_size '${batch_size}' --mode predict'
+    dev_cmd='python -m stanza.models.tagger --wordvec_dir '${WORDVEC_DIR}' --eval_file '${gold_dev_file}' --gold_file '${gold_dev_file}' --output_file '${dev_in_file}' --lang '${original_short}' --shorthand '${original_short}' --batch_size '${batch_size}' --mode predict'
     echo ''
     echo $dev_cmd
     eval $dev_cmd

diff --git a/scripts/prep_mwt_data.sh b/scripts/prep_mwt_data.sh
@@ -37,7 +37,7 @@ fi
 
 if [ -e $dev_conllu ]; then
     echo "Preparing dev data..."
-    python stanfordnlp/utils/contract_mwt.py $dev_conllu $dev_in_file
+    python stanza/utils/contract_mwt.py $dev_conllu $dev_in_file
     bash scripts/prep_tokenize_data.sh $src_treebank dev
 else
     touch $dev_in_file

diff --git a/scripts/prep_ner_data.sh b/scripts/prep_ner_data.sh
@@ -23,17 +23,17 @@ test_json_file=$NER_DATA_DIR/${short}.test.json
 
 # create json file if exists; otherwise create empty files
 if [ -e $train_file ]; then
-    python stanfordnlp/utils/prepare_ner_data.py $train_file $train_json_file
+    python stanza/utils/prepare_ner_data.py $train_file $train_json_file
 else
     touch $train_json_file
 fi
 if [ -e $dev_file ]; then
-    python stanfordnlp/utils/prepare_ner_data.py $dev_file $dev_json_file
+    python stanza/utils/prepare_ner_data.py $dev_file $dev_json_file
 else
     touch $dev_json_file
 fi
 if [ -e $test_file ]; then
-    python stanfordnlp/utils/prepare_ner_data.py $test_file $test_json_file
+    python stanza/utils/prepare_ner_data.py $test_file $test_json_file
 else
     touch $test_json_file
 fi

diff --git a/scripts/prep_tokenize_data.sh b/scripts/prep_tokenize_data.sh
@@ -24,10 +24,10 @@ short=`bash scripts/treebank_to_shorthand.sh ud $treebank`
 
 lang=`echo $short | sed -e 's#_.*##g'`
 echo "Preparing tokenizer $dataset data..."
-python stanfordnlp/utils/prepare_tokenizer_data.py $UDBASE/$treebank/${short}-ud-${dataset}.txt $UDBASE/$treebank/${short}-ud-${dataset}.conllu -o ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.toklabels -m ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}-mwt.json
+python stanza/utils/prepare_tokenizer_data.py $UDBASE/$treebank/${short}-ud-${dataset}.txt $UDBASE/$treebank/${short}-ud-${dataset}.conllu -o ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.toklabels -m ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}-mwt.json
 cp $UDBASE/$treebank/${short}-ud-${dataset}.conllu ${TOKENIZE_DATA_DIR}/${short}.${dataset}.gold.conllu
 cp $UDBASE/$treebank/${short}-ud-${dataset}.txt ${TOKENIZE_DATA_DIR}/${short}.${dataset}.txt
 # handle Vietnamese data
 if [ $lang == "vi" ]; then
-    python stanfordnlp/utils/postprocess_vietnamese_tokenizer_data.py $UDBASE/$treebank/${short}-ud-${dataset}.txt --char_level_pred ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.toklabels -o ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.json
+    python stanza/utils/postprocess_vietnamese_tokenizer_data.py $UDBASE/$treebank/${short}-ud-${dataset}.txt --char_level_pred ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.toklabels -o ${TOKENIZE_DATA_DIR}/${short}-ud-${dataset}.json
 fi
diff --git a/scripts/run_charlm.sh b/scripts/run_charlm.sh
@@ -21,9 +21,9 @@ dev_file=${CHARLM_DATA_DIR}/${lang}/${corpus_name}/dev.txt
 test_file=${CHARLM_DATA_DIR}/${lang}/${corpus_name}/test.txt
 
 echo "Running charlm for $lang:$corpus with $args..."
-python -m stanfordnlp.models.charlm --train_dir $train_dir --eval_file $dev_file \
+python -m stanza.models.charlm --train_dir $train_dir --eval_file $dev_file \
     --direction $direction --lang $lang --shorthand $short --mode train $args
-python -m stanfordnlp.models.charlm --eval_file $dev_file \
+python -m stanza.models.charlm --eval_file $dev_file \
     --direction $direction --lang $lang --shorthand $short --mode predict $args
-python -m stanfordnlp.models.charlm --eval_file $test_file \
+python -m stanza.models.charlm --eval_file $test_file \
     --direction $direction --lang $lang --shorthand $short --mode predict $args
diff --git a/scripts/run_depparse.sh b/scripts/run_depparse.sh
@@ -35,10 +35,10 @@ fi
 echo "Using batch size $batch_size"
 
 echo "Running parser with $args..."
-python -m stanfordnlp.models.parser --wordvec_dir $WORDVEC_DIR --train_file $train_file --eval_file $eval_file \
+python -m stanza.models.parser --wordvec_dir $WORDVEC_DIR --train_file $train_file --eval_file $eval_file \
     --output_file $output_file --gold_file $gold_file --lang $lang --shorthand $short --batch_size $batch_size --mode train $args
-python -m stanfordnlp.models.parser --wordvec_dir $WORDVEC_DIR --eval_file $eval_file \
+python -m stanza.models.parser --wordvec_dir $WORDVEC_DIR --eval_file $eval_file \
     --output_file $output_file --gold_file $gold_file --lang $lang --shorthand $short --mode predict $args
-results=`python stanfordnlp/utils/conll18_ud_eval.py -v $gold_file $output_file | head -12 | tail -n+12 | awk '{print $7}'`
+results=`python stanza/utils/conll18_ud_eval.py -v $gold_file $output_file | head -12 | tail -n+12 | awk '{print $7}'`
 echo $results $args >> ${DEPPARSE_DATA_DIR}/${short}.results
 echo $short $results $args