Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More prebuilt indexes features #235

Merged
merged 10 commits into from
Sep 29, 2020
Merged

More prebuilt indexes features #235

merged 10 commits into from
Sep 29, 2020

Conversation

qguo96
Copy link
Contributor

@qguo96 qguo96 commented Sep 27, 2020

  • Add SimpleSearcher.list_prebuilt_indexes(), IndexReader.from_prebuilt_index() and IndexReader.list_prebuilt_indexes()
  • I use a JSON file to replace the dictionary

@qguo96
Copy link
Contributor Author

qguo96 commented Sep 27, 2020

This is a demo of function list_prebuilt_indexes():
Screenshot (1223)
Screenshot (1224)

@lintool
Copy link
Member

lintool commented Sep 27, 2020

Nice! A few suggestions/comments:

  • what if the output is just a pandas dataframe?
  • size: just make exact number of bytes? (uncompressed = sum of bytes of all files)?
  • unique terms = -1 - that might be an issue with an underlying method... can you file an issue separate and I'll look into it?
  • pyserini/indexes_info.json: can we just take the JSON structures and move into directly into Python (i.e., basically make it a `.py' file)? I think this will make PyPI distribution easier, so we don't forget about an additional file...

@qguo96
Copy link
Contributor Author

qguo96 commented Sep 27, 2020

hi! @lintool, I just created an issue here #236.

@qguo96 qguo96 marked this pull request as draft September 27, 2020 21:55
@qguo96 qguo96 marked this pull request as ready for review September 27, 2020 21:58
@qguo96
Copy link
Contributor Author

qguo96 commented Sep 27, 2020

  • Both uncompressed size and compressed size are displayed in bytes.
  • Now the output is a pandas data frame.
  • That JSON file is rewritten in a py file.

Here is a demo of output:
Screenshot (1229)
Screenshot (1230)

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg. please add docstrings, and we're ready for merging.

pyserini/index/_base.py Show resolved Hide resolved
@qguo96
Copy link
Contributor Author

qguo96 commented Sep 29, 2020

add information about enwiki and zhwiki.

add information of wiki
@lintool lintool merged commit 2ed2acc into castorini:master Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants