Skip to content

Commit

Permalink
Update docs and install extras [ci skip]
Browse files Browse the repository at this point in the history
  • Loading branch information
ines committed Oct 8, 2020
1 parent eb28e8c commit 43e59bb
Show file tree
Hide file tree
Showing 13 changed files with 62 additions and 44 deletions.
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ lookups =
spacy_lookups_data>=1.0.0rc0,<1.0.0
transformers =
spacy_transformers>=1.0.0a17,<1.0.0
ray =
spacy_ray>=0.0.1,<1.0.0
cuda =
cupy>=5.0.0b4,<9.0.0
cuda80 =
Expand Down
24 changes: 12 additions & 12 deletions website/docs/api/transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ api_string_name: transformer
> #### Installation
>
> ```bash
> $ pip install spacy-transformers
> $ pip install -U %%SPACY_PKG_NAME[transformers] %%SPACY_PKG_FLAGS
> ```
<Infobox title="Important note" variant="warning">
Expand Down Expand Up @@ -385,12 +385,12 @@ are wrapped into the
by this class. Instances of this class are typically assigned to the
[`Doc._.trf_data`](/api/transformer#custom-attributes) extension attribute.
| Name | Description |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tokens` | A slice of the tokens data produced by the tokenizer. This may have several fields, including the token IDs, the texts and the attention mask. See the [`transformers.BatchEncoding`](https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.BatchEncoding) object for details. ~~dict~~ |
| Name | Description |
| --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tokens` | A slice of the tokens data produced by the tokenizer. This may have several fields, including the token IDs, the texts and the attention mask. See the [`transformers.BatchEncoding`](https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.BatchEncoding) object for details. ~~dict~~ |
| `tensors` | The activations for the `Doc` from the transformer. Usually the last tensor that is 3-dimensional will be the most important, as that will provide the final hidden state. Generally activations that are 2-dimensional will be attention weights. Details of this variable will differ depending on the underlying transformer model. ~~List[FloatsXd]~~ |
| `align` | Alignment from the `Doc`'s tokenization to the wordpieces. This is a ragged array, where `align.lengths[i]` indicates the number of wordpiece tokens that token `i` aligns against. The actual indices are provided at `align[i].dataXd`. ~~Ragged~~ |
| `width` | The width of the last hidden layer. ~~int~~ |
| `align` | Alignment from the `Doc`'s tokenization to the wordpieces. This is a ragged array, where `align.lengths[i]` indicates the number of wordpiece tokens that token `i` aligns against. The actual indices are provided at `align[i].dataXd`. ~~Ragged~~ |
| `width` | The width of the last hidden layer. ~~int~~ |
### TransformerData.empty {#transformerdata-emoty tag="classmethod"}
Expand All @@ -406,13 +406,13 @@ Holds a batch of input and output objects for a transformer model. The data can
then be split to a list of [`TransformerData`](/api/transformer#transformerdata)
objects to associate the outputs to each [`Doc`](/api/doc) in the batch.
| Name | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Name | Description |
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `spans` | The batch of input spans. The outer list refers to the Doc objects in the batch, and the inner list are the spans for that `Doc`. Note that spans are allowed to overlap or exclude tokens, but each `Span` can only refer to one `Doc` (by definition). This means that within a `Doc`, the regions of the output tensors that correspond to each `Span` may overlap or have gaps, but for each `Doc`, there is a non-overlapping contiguous slice of the outputs. ~~List[List[Span]]~~ |
| `tokens` | The output of the tokenizer. ~~transformers.BatchEncoding~~ |
| `tensors` | The output of the transformer model. ~~List[torch.Tensor]~~ |
| `align` | Alignment from the spaCy tokenization to the wordpieces. This is a ragged array, where `align.lengths[i]` indicates the number of wordpiece tokens that token `i` aligns against. The actual indices are provided at `align[i].dataXd`. ~~Ragged~~ |
| `doc_data` | The outputs, split per `Doc` object. ~~List[TransformerData]~~ |
| `tokens` | The output of the tokenizer. ~~transformers.BatchEncoding~~ |
| `tensors` | The output of the transformer model. ~~List[torch.Tensor]~~ |
| `align` | Alignment from the spaCy tokenization to the wordpieces. This is a ragged array, where `align.lengths[i]` indicates the number of wordpiece tokens that token `i` aligns against. The actual indices are provided at `align[i].dataXd`. ~~Ragged~~ |
| `doc_data` | The outputs, split per `Doc` object. ~~List[TransformerData]~~ |
### FullTransformerBatch.unsplit_by_doc {#fulltransformerbatch-unsplit_by_doc tag="method"}
Expand Down
3 changes: 1 addition & 2 deletions website/docs/usage/embeddings-transformers.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,8 +216,7 @@ in `/opt/nvidia/cuda`, you would run:
```bash
### Installation with CUDA
$ export CUDA_PATH="/opt/nvidia/cuda"
$ pip install cupy-cuda102
$ pip install spacy-transformers
$ pip install -U %%SPACY_PKG_NAME[cud102,transformers]%%SPACY_PKG_FLAGS
```

### Runtime usage {#transformers-runtime}
Expand Down
27 changes: 14 additions & 13 deletions website/docs/usage/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Before you install spaCy and its dependencies, make sure that your `pip`,
```bash
$ pip install -U pip setuptools wheel
$ pip install -U spacy
$ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
```
When using pip it is generally recommended to install packages in a virtual
Expand All @@ -57,7 +57,7 @@ environment to avoid modifying system state:
$ python -m venv .env
$ source .env/bin/activate
$ pip install -U pip setuptools wheel
$ pip install spacy
$ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
```

spaCy also lets you install extra dependencies by specifying the following
Expand All @@ -68,15 +68,16 @@ spaCy's [`setup.cfg`](%%GITHUB_SPACY/setup.cfg) for details on what's included.
> #### Example
>
> ```bash
> $ pip install spacy[lookups,transformers]
> $ pip install %%SPACY_PKG_NAME[lookups,transformers]%%SPACY_PKG_FLAGS
> ```
| Name | Description |
| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lookups` | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
| `transformers` | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline. |
| `cuda`, ... | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options. |
| `ja`, `ko`, `th` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages). |
| Name | Description |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lookups` | Install [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) for data tables for lemmatization and lexeme normalization. The data is serialized with trained pipelines, so you only need this package if you want to train your own models. |
| `transformers` | Install [`spacy-transformers`](https://github.com/explosion/spacy-transformers). The package will be installed automatically when you install a transformer-based pipeline. |
| `ray` | Install [`spacy-ray`](https://github.com/explosion/spacy-ray) to add CLI commands for [parallel training](/usage/training#parallel-training). |
| `cuda`, ... | Install spaCy with GPU support provided by [CuPy](https://cupy.chainer.org) for your given CUDA version. See the GPU [installation instructions](#gpu) for details and options. |
| `ja`, `ko`, `th`, `zh` | Install additional dependencies required for tokenization for the [languages](/usage/models#languages). |
### conda {#conda}
Expand All @@ -88,8 +89,8 @@ $ conda install -c conda-forge spacy
```
For the feedstock including the build recipe and configuration, check out
[this repository](https://github.com/conda-forge/spacy-feedstock). Improvements
and pull requests to the recipe and setup are always appreciated.
[this repository](https://github.com/conda-forge/spacy-feedstock). Note that we
currently don't publish any [pre-releases](#changelog-pre) on conda.
### Upgrading spaCy {#upgrading}
Expand All @@ -116,7 +117,7 @@ are printed. It's recommended to run the command with `python -m` to make sure
you're executing the correct version of spaCy.
```cli
$ pip install -U spacy
$ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
$ python -m spacy validate
```
Expand All @@ -134,7 +135,7 @@ specifier allows cupy to be installed via wheel, saving some compilation time.
The specifiers should install [`cupy`](https://cupy.chainer.org).
```bash
$ pip install -U spacy[cuda92]
$ pip install -U %%SPACY_PKG_NAME[cuda92]%%SPACY_PKG_FLAGS
```
Once you have a GPU-enabled installation, the best way to activate it is to call
Expand Down
9 changes: 6 additions & 3 deletions website/docs/usage/linguistic-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ lookup lemmatizer looks up the token surface form in the lookup table without
reference to the token's part-of-speech or context.

```python
# pip install spacy-lookups-data
# pip install -U %%SPACY_PKG_NAME[lookups]%%SPACY_PKG_FLAGS
import spacy

nlp = spacy.blank("sv")
Expand All @@ -181,7 +181,7 @@ rule-based lemmatizer can be added using rule tables from
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data):

```python
# pip install spacy-lookups-data
# pip install -U %%SPACY_PKG_NAME[lookups]%%SPACY_PKG_FLAGS
import spacy

nlp = spacy.blank("de")
Expand Down Expand Up @@ -1801,7 +1801,10 @@ print(doc2[5].tag_, doc2[5].pos_) # WP PRON
<Infobox variant="warning" title="Migrating from spaCy v2.x">
The [`AttributeRuler`](/api/attributeruler) can import a **tag map and morph rules** in the v2.x format via its built-in methods or when the component is initialized before training. See the [migration guide](/usage/v3#migrating-training-mappings-exceptions) for details.
The [`AttributeRuler`](/api/attributeruler) can import a **tag map and morph
rules** in the v2.x format via its built-in methods or when the component is
initialized before training. See the
[migration guide](/usage/v3#migrating-training-mappings-exceptions) for details.
</Infobox>
Expand Down
4 changes: 2 additions & 2 deletions website/docs/usage/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ contribute to development.
> separately in the same environment:
>
> ```bash
> $ pip install spacy[lookups]
> $ pip install -U %%SPACY_PKG_NAME[lookups]%%SPACY_PKG_FLAGS
> ```
import Languages from 'widgets/languages.js'
Expand Down Expand Up @@ -287,7 +287,7 @@ The download command will [install the package](/usage/models#download-pip) via
pip and place the package in your `site-packages` directory.
```cli
$ pip install -U spacy
$ pip install -U %%SPACY_PKG_NAME%%SPACY_PKG_FLAGS
$ python -m spacy download en_core_web_sm
```
Expand Down
4 changes: 2 additions & 2 deletions website/docs/usage/projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -813,7 +813,7 @@ full embedded visualizer, as well as individual components.
> #### Installation
>
> ```bash
> $ pip install "spacy-streamlit>=1.0.0a0"
> $ pip install spacy-streamlit --pre
> ```
![](../images/spacy-streamlit.png)
Expand Down Expand Up @@ -911,7 +911,7 @@ https://github.com/explosion/projects/blob/v3/integrations/fastapi/scripts/main.
> #### Installation
>
> ```cli
> $ pip install spacy-ray
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
> # Check that the CLI is registered
> $ python -m spacy ray --help
> ```
Expand Down
2 changes: 1 addition & 1 deletion website/docs/usage/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -1249,7 +1249,7 @@ valid.
> #### Installation
>
> ```cli
> $ pip install spacy-ray
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
> # Check that the CLI is registered
> $ python -m spacy ray --help
> ```
Expand Down
Loading

0 comments on commit 43e59bb

Please sign in to comment.