Skip to content

Commit

Permalink
Add to onboarding reproduction logs (#2027)
Browse files Browse the repository at this point in the history
  • Loading branch information
Samantha-Zhan authored Nov 18, 2024
1 parent a95b0e0 commit f6f8ecc
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 8 deletions.
7 changes: 4 additions & 3 deletions docs/conceptual-framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,13 +131,13 @@ So, the indexing phase with Lucene, as in the [previous exercise](experiments-ms
Conceptually, you've computed the BM25 document vector of every document in the collection and stored them in a data structure called an inverted index.
(Actually, in reality, the inverted index only stores component statistics that allow you to reconstruct the BM25 document vectors.)

With the `IndexReader` class in Pyserini, you can materialize (i.e., reconstruct) the BM25 document vector for a particular document:
With the `LuceneIndexReader` class in Pyserini, you can materialize (i.e., reconstruct) the BM25 document vector for a particular document:

```python
from pyserini.index.lucene import IndexReader
from pyserini.index.lucene import LuceneIndexReader
import json

index_reader = IndexReader('indexes/lucene-index-msmarco-passage')
index_reader = LuceneIndexReader('indexes/lucene-index-msmarco-passage')
tf = index_reader.get_document_vector('7187158')
bm25_weights = \
{term: index_reader.compute_bm25_term_weight('7187158', term, analyzer=None) \
Expand Down Expand Up @@ -368,3 +368,4 @@ Before you move on, however, add an entry in the "Reproduction Log" at the botto
+ Results reproduced by [@alirezaJvh](https://github.com/alirezaJvh) on 2024-10-05 (commit [`3f76099`](https://github.com/castorini/pyserini/commit/3f76099a73820afee12496c0354d52ca6a6175c2))
+ Results reproduced by [@Raghav0005](https://github.com/Raghav0005) on 2024-10-09 (commit [`7ed8369`](https://github.com/castorini/pyserini/commit/7ed83698298139efdfd62b6893d673aa367b4ac8))
+ Results reproduced by [@Pxlin-09](https://github.com/pxlin-09) on 2024-10-26 (commit [`af2d3c5`](https://github.com/castorini/pyserini/commit/af2d3c52953b916e242142dbcf4799ecdb9abbee))
+ Results reproduced by [@Samantha-Zhan](https://github.com/Samantha-Zhan) on 2024-11-17 (commit [`a95b0e0`](https://github.com/castorini/pyserini/commit/a95b0e04a1636e0f4151197c235c961b3c832802))
9 changes: 5 additions & 4 deletions docs/conceptual-framework2.md
Original file line number Diff line number Diff line change
Expand Up @@ -417,10 +417,10 @@ We did exactly the same thing for [the MS MARCO passage ranking test collection]
Next, let's generate the BM25 document vector for doc `MED-4555`, the same document we examined above.

```python
from pyserini.index.lucene import IndexReader
from pyserini.index.lucene import LuceneIndexReader
import json

index_reader = IndexReader('indexes/lucene.nfcorpus')
index_reader = LuceneIndexReader('indexes/lucene.nfcorpus')
tf = index_reader.get_document_vector('MED-4555')
bm25_weights = \
{term: index_reader.compute_bm25_term_weight('MED-4555', term, analyzer=None) \
Expand Down Expand Up @@ -472,11 +472,11 @@ With this setup, we can now perform end-to-end retrieval for a query "by hand",

```python
from pyserini.search.lucene import LuceneSearcher
from pyserini.index.lucene import IndexReader
from pyserini.index.lucene import LuceneIndexReader
from tqdm import tqdm

searcher = LuceneSearcher('indexes/lucene.nfcorpus')
index_reader = IndexReader('indexes/lucene.nfcorpus')
index_reader = LuceneIndexReader('indexes/lucene.nfcorpus')

scores = []
# Iterate through all docids in the index.
Expand Down Expand Up @@ -611,3 +611,4 @@ Before you move on, however, add an entry in the "Reproduction Log" at the botto
+ Results reproduced by [@alirezaJvh](https://github.com/alirezaJvh) on 2024-10-05 (commit [`3f76099`](https://github.com/castorini/pyserini/commit/3f76099a73820afee12496c0354d52ca6a6175c2))
+ Results reproduced by [@Raghav0005](https://github.com/Raghav0005) on 2024-10-09 (commit [`7ed8369`](https://github.com/castorini/pyserini/commit/7ed83698298139efdfd62b6893d673aa367b4ac8))
+ Results reproduced by [@Pxlin-09](https://github.com/pxlin-09) on 2024-10-26 (commit [`af2d3c5`](https://github.com/castorini/pyserini/commit/af2d3c52953b916e242142dbcf4799ecdb9abbee))
+ Results reproduced by [@Samantha-Zhan](https://github.com/Samantha-Zhan) on 2024-11-17 (commit [`a95b0e0`](https://github.com/castorini/pyserini/commit/a95b0e04a1636e0f4151197c235c961b3c832802))
1 change: 1 addition & 0 deletions docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,3 +406,4 @@ Before you move on, however, add an entry in the "Reproduction Log" at the botto
+ Results reproduced by [@alirezaJvh](https://github.com/alirezaJvh) on 2024-10-05 (commit [`3f76099`](https://github.com/castorini/pyserini/commit/3f76099a73820afee12496c0354d52ca6a6175c2))
+ Results reproduced by [@Raghav0005](https://github.com/Raghav0005) on 2024-10-07 (commit [`7ed8369`](https://github.com/castorini/pyserini/commit/7ed83698298139efdfd62b6893d673aa367b4ac8))
+ Results reproduced by [@Pxlin-09](https://github.com/pxlin-09) on 2024-10-26 (commit [`af2d3c5`](https://github.com/castorini/pyserini/commit/af2d3c52953b916e242142dbcf4799ecdb9abbee))
+ Results reproduced by [@Samantha-Zhan](https://github.com/Samantha-Zhan) on 2024-11-17 (commit [`a95b0e0`](https://github.com/castorini/pyserini/commit/a95b0e04a1636e0f4151197c235c961b3c832802))
2 changes: 1 addition & 1 deletion docs/experiments-nfcorpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,4 +406,4 @@ Before you move on, however, add an entry in the "Reproduction Log" at the botto
+ Results reproduced by [@alirezaJvh](https://github.com/alirezaJvh) on 2024-10-05 (commit [`3f76099`](https://github.com/castorini/pyserini/commit/3f76099a73820afee12496c0354d52ca6a6175c2))
+ Results reproduced by [@Raghav0005](https://github.com/Raghav0005) on 2024-10-09 (commit [`7ed8369`](https://github.com/castorini/pyserini/commit/7ed83698298139efdfd62b6893d673aa367b4ac8))
+ Results reproduced by [@Pxlin-09](https://github.com/pxlin-09) on 2024-10-26 (commit [`af2d3c5`](https://github.com/castorini/pyserini/commit/af2d3c52953b916e242142dbcf4799ecdb9abbee))

+ Results reproduced by [@Samantha-Zhan](https://github.com/Samantha-Zhan) on 2024-11-17 (commit [`a95b0e0`](https://github.com/castorini/pyserini/commit/a95b0e04a1636e0f4151197c235c961b3c832802))

0 comments on commit f6f8ecc

Please sign in to comment.