From 7caedfc150f916de302297406c45dead27b475ba Mon Sep 17 00:00:00 2001
From: Jimmy Lin <jimmylin@uwaterloo.ca>
Date: Sat, 2 Jan 2021 11:17:09 -0500
Subject: [PATCH] Updated documentation about pre-built indexes (#288)

---
 docs/prebuilt-indexes.md  | 40 ++++++++++++++++++++++++++++++++++++---
 docs/usage-indexreader.md |  3 +++
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/docs/prebuilt-indexes.md b/docs/prebuilt-indexes.md
index a2758f8c6..836b5a282 100644
--- a/docs/prebuilt-indexes.md
+++ b/docs/prebuilt-indexes.md
@@ -1,13 +1,47 @@
 # Pyserini: Prebuilt Indexes
 
 Pre-built Anserini indexes are hosted at the University of Waterloo's [GitLab](https://git.uwaterloo.ca/jimmylin/anserini-indexes) and mirrored on Dropbox.
-The following method will list available pre-built indexes:
+The following methods will list available pre-built indexes:
 
-```
+```python
+from pyserini.search import SimpleSearcher
 SimpleSearcher.list_prebuilt_indexes()
+
+from pyserini.index import IndexReader
+IndexReader.list_prebuilt_indexes()
+```
+
+It's easy initialize a searcher from a pre-built index:
+
+```python
+searcher = SimpleSearcher.from_prebuilt_index('robust04')
+```
+
+You can use this simple Python one-liner to download the pre-built index:
+
+```
+python -c "from pyserini.search import SimpleSearcher; SimpleSearcher.from_prebuilt_index('robust04')"
 ```
 
-Below is a summary of what's currently available:
+The downloaded index will be in `~/.cache/pyserini/indexes/`.
+
+It's similarly easy initialize an index reader from a pre-built index:
+
+```python
+index_reader = IndexReader.from_prebuilt_index('robust04')
+index_reader.stats()
+```
+
+The output will be:
+
+```
+{'total_terms': 174540872, 'documents': 528030, 'non_empty_documents': 528030, 'unique_terms': 923436}
+```
+
+Note that unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
+Nope, that's not a bug.
+
+Below is a summary of the pre-built indexes that are currently available.
 
 ## MS MARCO Indexes
 
diff --git a/docs/usage-indexreader.md b/docs/usage-indexreader.md
index af846b09d..72a77b5c7 100644
--- a/docs/usage-indexreader.md
+++ b/docs/usage-indexreader.md
@@ -162,3 +162,6 @@ Output is something like this:
  'non_empty_documents': 528030,
  'unique_terms': 923436}
 ```
+
+Note that unless the underlying index was built with the `-optimize` option (i.e., merging all index segments into a single segment), `unique_terms` will show -1.
+Nope, that's not a bug.