diff --git a/README.md b/README.md index 921dc4623..e24794916 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,9 @@ With Pyserini, it's easy to reproduce runs on a number of standard IR test colle For additional details, [our paper](https://dl.acm.org/doi/10.1145/3404835.3463238) in SIGIR 2021 provides a nice overview. +❗ Anserini was upgraded from JDK 11 to JDK 21 at commit [`272565`](https://github.com/castorini/anserini/commit/39cecf6c257bae85f4e9f6ab02e0be101338c3cc) (2024/04/03), which corresponds to the release of v0.35.0. +Correspondingly, Pyserini was upgraded to JDK 21 at commit [`b2f677`](https://github.com/castorini/pyserini/commit/b2f677da46e1910c0fd95e5ff06070bc71075401) (2024/04/04). + ## 🎬 Installation Install via PyPI (requires Python 3.10+): @@ -25,7 +28,7 @@ Install via PyPI (requires Python 3.10+): pip install pyserini ``` -Sparse retrieval depends on [Anserini](http://anserini.io/), which is itself built on Lucene, and thus Java 11. +Sparse retrieval depends on [Anserini](http://anserini.io/), which is itself built on Lucene (written in Java), and thus requiring JDK 21. Dense retrieval depends on neural networks and requires a more complex set of dependencies. A `pip` installation will automatically pull in the [🤗 Transformers library](https://github.com/huggingface/transformers) to satisfy the package requirements. @@ -188,6 +191,7 @@ Additional reproduction guides below provide detailed step-by-step instructions. ## 📜️ Release History ++ v0.35.0 (w/ Anserini v0.35.0): April 4, 2024 [[Release Notes](docs/release-notes/release-notes-v0.35.0.md)] + v0.25.0 (w/ Anserini v0.25.0): March 31, 2024 [[Release Notes](docs/release-notes/release-notes-v0.25.0.md)] + v0.24.0 (w/ Anserini v0.24.0): December 28, 2023 [[Release Notes](docs/release-notes/release-notes-v0.24.0.md)] + v0.23.0 (w/ Anserini v0.23.0): November 17, 2023 [[Release Notes](docs/release-notes/release-notes-v0.23.0.md)] diff --git a/docs/installation.md b/docs/installation.md index 4c0465d94..f59927ee4 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,6 +1,6 @@ # Pyserini: Detailed Installation Guide -Pyserini requires Python 3.10+. +Pyserini is built on Python 3.10. At a high level, we try to keep our [`requirements.txt`](../requirements.txt) up to date. Pyserini has a number of important dependencies: @@ -25,20 +25,20 @@ conda create -n pyserini python=3.10 -y conda activate pyserini ``` -If you do not already have JDK 11 installed, install via `conda`: +If you do not already have JDK 21 installed, install via `conda`: ```bash -conda install -c conda-forge openjdk=11 maven -y +conda install -c conda-forge openjdk=21 maven -y ``` -If your system already has JDK 11 installed, the above step can be skipped. +If your system already has JDK 21 installed, the above step can be skipped. Use `java --version` to check one way or the other. If you're on an Intel-based Mac, the following recipe should work: ```bash conda install wget -y -conda install -c conda-forge openjdk=11 maven -y +conda install -c conda-forge openjdk=21 maven -y conda install -c conda-forge lightgbm nmslib -y # from https://github.com/facebookresearch/faiss/blob/main/INSTALL.md @@ -53,19 +53,15 @@ If you're on a Mac with an M-series (i.e., ARM) processor, the following recipe ```bash conda install wget -y -conda install -c conda-forge openjdk=11 maven -y -conda install -c conda-forge lightgbm -y - -# from https://github.com/nmslib/nmslib/issues/476#issuecomment-1594889437 -CFLAGS="-mavx -DWARN(a)=(a)" pip install --use-pep517 nmslib - -# from https://github.com/facebookresearch/faiss/blob/main/INSTALL.md -conda install -c pytorch faiss-cpu=1.7.4 blas=1.0 -y +conda install -c conda-forge openjdk=21 maven -y +conda install -c conda-forge lightgbm nmslib -y conda install -c pytorch faiss-cpu pytorch -y pip install pyserini ``` +As of April 2024, for `faiss-cpu`, `osx-64` is still at v1.7.4, whereas `osx-arm64` is at v1.8.0; hence the differences in the instructions above. + ### Linux On Linux, `pip` is an alternative that's a bit more lightweight: @@ -122,13 +118,9 @@ If everything is working properly, you should be able to reproduce the results a If you're planning on just _using_ Pyserini, then the instructions above are fine. However, if you're planning on contributing to the codebase or want to work with the latest not-yet-released features, you'll need a development installation. -Install dependencies: - -```bash -pip install torch faiss-cpu cohere -``` +Start the same way as the install above, but **don't** install `pip install pyserini`. -Clone the Pyserini repo with the `--recurse-submodules` option to make sure the `tools/` submodule also gets cloned: +Instead, clone the Pyserini repo with the `--recurse-submodules` option to make sure the `tools/` submodule also gets cloned: ```bash git clone git@github.com:castorini/pyserini.git --recurse-submodules @@ -142,13 +134,7 @@ cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make & cd tools/eval/ndeval && make && cd ../../.. ``` -You can then set up your Python environment in exactly the same way as a `pip` installation, except replace this: - -```bash -pip install pip -``` - -With an ["editable" installation](https://setuptools.pypa.io/en/latest/userguide/development_mode.html), as follows: +Then, in the `pyserini` clone, use `pip` to add an ["editable" installation](https://setuptools.pypa.io/en/latest/userguide/development_mode.html), as follows: ```bash pip install -e . @@ -177,13 +163,13 @@ Assuming all tests pass, you should be ready to go! + The above guide handle JVM installation via conda. If you are using your own Java environment and get an error about Java version mismatch, it's likely an issue with your `JAVA_HOME` environmental variable. In `bash`, use `echo $JAVA_HOME` to find out what the environmental variable is currently set to, and use `export JAVA_HOME=/path/to/java/home` to change it to the correct path. -On a Linux system, the correct path might look something like `/usr/lib/jvm/java-11`. +On a Linux system, the correct path might look something like `/usr/lib/jvm/java-21`. Unfortunately, we are unable to offer more concrete advice since the actual path depends on your OS, which JDK you're using, and a host of other factors. + On Apple's M-series processors, make sure you've installed the ARM-based release of Conda instead of the Intel-based release. ## Internal Notes -At the University of Waterloo, we have two (CPU) development servers, `tuna` and `ocra`. +At the University of Waterloo, we have two (CPU) development servers, `tuna` and `orca`. Note that on these two servers, the root disk (where your home directory is mounted) doesn't have much space. So, you need to set pyserini cache path to scratch space. diff --git a/docs/release-notes/release-notes-v0.35.0.md b/docs/release-notes/release-notes-v0.35.0.md new file mode 100644 index 000000000..7309d1000 --- /dev/null +++ b/docs/release-notes/release-notes-v0.35.0.md @@ -0,0 +1,51 @@ +# Pyserini Release Notes (v0.35.0) + ++ **Release date:** April 4, 2024 ++ **Anserini dependency:** v0.35.0 ++ **Lucene dependency:** v9.9.1 + +## Summary of Changes + ++ Upgraded to JDK 21. + +## Contributors + +### This Release + +Sorted by number of commits: + ++ Ashish Kumar ([ashishakkumar](https://github.com/ashishakkumar)) ++ Grace He ([Lindaaa8](https://github.com/Lindaaa8)) ++ Jimmy Lin ([lintool](https://github.com/lintool)) ++ Sahel Sharifymoghaddam ([sahel-sh](https://github.com/sahel-sh)) + +### All Time + +All contributors with five or more commits, sorted by number of commits, [according to GitHub](https://github.com/castorini/pyserini/graphs/contributors): + ++ Jimmy Lin ([lintool](https://github.com/lintool)) ++ Xueguang Ma ([MXueguang](https://github.com/MXueguang)) ++ Xinyu (Crystina) Zhang ([crystina-z](https://github.com/crystina-z)) ++ Yuqi Liu ([yuki617](https://github.com/yuki617)) ++ Johnson Han ([x65han](https://github.com/x65han)) ++ Stephanie Hu ([stephaniewhoo](https://github.com/stephaniewhoo)) ++ Arthur Chen ([ArthurChen189](https://github.com/ArthurChen189)) ++ Jasper Xian ([jasper-xian](https://github.com/jasper-xian)) ++ Manveer Tamber ([manveertamber](https://github.com/manveertamber)) ++ Jack Lin ([jacklin64](https://github.com/jacklin64)) ++ Sahel Sharifymoghaddam ([sahel-sh](https://github.com/sahel-sh)) ++ Jheng-Hong Yang ([justram](https://github.com/justram)) ++ Minghan Li ([alexlimh](https://github.com/alexlimh)) ++ Mofe Adeyemi ([Mofetoluwa](https://github.com/Mofetoluwa)) ++ Catherine Zhou ([Cathrineee](https://github.com/Cathrineee)) ++ Ogundepo Odunayo ([ToluClassics](https://github.com/ToluClassics)) ++ Hang Li ([hanglics](https://github.com/hanglics)) ++ Ronak Pradeep ([ronakice](https://github.com/ronakice)) ++ Chris Kamphuis ([Chriskamphuis](https://github.com/Chriskamphuis)) ++ Zeynep Akkalyoncu Yilmaz ([zeynepakkalyoncu](https://github.com/zeynepakkalyoncu)) ++ Xinyu Mavis Liu ([x389liu](https://github.com/x389liu)) ++ Sailesh Nankani ([saileshnankani](https://github.com/saileshnankani)) ++ Shengyao Zhuang ([ArvinZhuang](https://github.com/ArvinZhuang)) ++ Habeeb Shopeju ([HAKSOAT](https://github.com/HAKSOAT)) ++ Ehsan ([ehsk](https://github.com/ehsk)) ++ Pepijn Boers ([PepijnBoers](https://github.com/PepijnBoers))