The corresponding code for our paper: Evaluating the Impact of Retrieval on Multi-document Summarization.
The raw experimental results, including the output of the human evaluation, can be downloaded here.
Warning The download is approximately 7GB compressed.
This repository requires Python 3.9 or later.
First, activate a virtual environment. Then, install with pip
right from GitHub:
pip install "git+https://github.com/allenai/open-mds.git"
or clone the repo locally and install from source:
git clone https://github.com/allenai/open-mds.git
cd open-mds
pip install -e .
To install using Poetry (this will activate a virtual environment for you):
# Install poetry for your system: https://python-poetry.org/docs/#installation
# E.g. for Linux, macOS, Windows (WSL)
curl -sSL https://install.python-poetry.org | python3 -
# Clone and move into the repo
git clone https://github.com/allenai/open-mds
cd open-mds
# Install the package with poetry
poetry install
There are several ways to interact with this codebase. If you are interested in analyzing the results of our experiments or reproducing tables or figures from the paper, please see π Notebooks. If you are interested in running the open-domain MDS experiments, see π Open-domain MDS. If you are interested in running our experiments simulating document retrieval errors, see π§ͺ Simulating Document Retrieval Errors. If you would like to reproduce the results in the paper from scratch, please see our detailed instructions here.
We have notebooks corresponding to each of the major experiments in the paper:
- Dataset Statistics (): Compute simple dataset statistics for each dataset in the paper.
- Open-Domain MDS (): Analyze the results from the open-domain MDS experiments.
- Baselines (): Computes the summarization performance of several simple baselines for each dataset in the paper.
- Training (): Analyze the results from the experiments where we fine-tune summarizers in the open-domain setting.
- Perturbations (): Analyze the results of our simulated document retrieval error experiments.
- Sorting (): Analyze the results of the sorting perturbation experiment.
- Human Evaluation (): Setup and analyze the results of the human evaluation.
If you are running the notebooks locally, make sure to add a virtual environment with this project installed in it to an IPython kernel:
pip install --user ipykernel
python -m ipykernel install --user --name=<myenv>
You can now select this environment as a kernel in the notebook. See here for more details.
Note some IDEs, like VSCode, will automate this process when you launch a notebook with the virtual environment active.
In the paper, we bootstrap the newly proposed task of open-domain MDS using existing datasets, retrievers, and summarizers. To run these experiments, there are two steps:
This is only required if you want to re-index the datasets and re-retrieve the input documents. We have already done this and made them publicly available. Just look for "allenai/[dataset_name]_[sparse|dense]_[max|mean|oracle]"
on HuggingFace, e.g. "allenai/multinews_sparse_mean"
.
Otherwise, please use the index_and_retrieve.py script. First, make sure you have installed the required dependencies
# With pip
pip install "git+https://github.com/allenai/open-mds.git#egg=open_mds[retrieval]"
# OR, if installing with poetry
poetry install -E "retrieval"
Then you can see detailed instructions by calling
python ./scripts/index_and_retrieve.py --help
Here are a few examples:
Re-build the examples of the Multi-News dataset with a "sparse"
retriever and "oracle"
top-k
strategy
python ./scripts/index_and_retrieve.py "multinews" "./output/datasets/multinews_sparse_oracle" \
--retriever "sparse" \
--top-k-strategy "oracle"
Re-build the examples of the MS^2 dataset with a "dense"
retriever and "mean"
top-k
strategy
python ./scripts/index_and_retrieve.py "ms2" "./output/datasets/ms2_dense_mean" \
--retriever "dense" \
--top-k-strategy "mean"
Other experiments can be crafted by modifying the arguments accordingly.
-
If
index-path
is not provided, document indices will be saved to disk under a default location. You can get this path by callingpython -c "from open_mds.common import util ; print(util.CACHE_DIR)"
-
If you wish to use the
dense
retriever, you will need to install FAISS with GPU support. See here for detailed instructions.
Usage is similar to the original run_summarization.py
script from HuggingFace, but with extra arguments for the retrieval. To make things easier, we provide configs for the models and datasets we investigated. Here are a few examples:
1οΈβ£ Evaluate PEGASUS with a "sparse"
retriever and "mean"
top-k
strategy on the Multi-News dataset
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/multinews/pegasus/eval.yml" \
output_dir="./output/multinews/pegasus/retrieval/sparse/mean" \
dataset_name="allenai/multinews_sparse_mean" \
retriever="sparse" \
top_k_strategy="mean"
2οΈβ£ Evaluate PRIMERA with a "dense"
retriever and "oracle"
top-k
strategy on the Multi-XScience dataset
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/multixscience/primera/eval.yml" \
output_dir="./output/multixscience/primera/retrieval/dense/oracle" \
dataset_name="allenai/multixscience_dense_oracle" \
retriever="dense" \
top_k_strategy="oracle"
Other experiments can be crafted by modifying the arguments accordingly.
In the paper, we simulate document retrieval errors by perturbing the input document sets of several popular MDS datasets before they are provided to the summarizer. Each of the perturbations is designed to mimic an error likely to be made by a retriever in the open-domain setting:
"addition"
: Add one or more documents to the input to mimic the retrieval of irrelevant documents."deletion"
: Remove one or more documents from the input to mimic the failure to retrieve relevant documents."duplication"
: Duplicate input documents to simulate the retrieval of duplicate (or near-duplicate) documents from the index."replacement"
: Replace one or more documents in the input with another document. This is a combination of"addition"
and"deletion"
."sorting"
: Sort (or shuffle) the order of input documents to simulate different rank-ordered lists from a retriever."backtranslation"
: Replace one or more documents in the input with a backtranslated copy. This is not an error a retriever would make, but allows us to compare and contrast the known sensitivity of NLP models to small token-level changes in their inputs with the document-level changes we are interested in.
We include two different document "selection strategies" that apply to each perturbation:
"random"
: Randomly selects documents for each perturbation. This mimics a (very) weak retriever."oracle"
: Attempts to select documents in a way that mimics a strong retriever. E.g. for"deletion"
, documents least similar to the target summary are removed first.
Please see the paper for more details on the experimental setup.
Usage is similar to the original run_summarization.py
script from HuggingFace, but with extra arguments for the perturbation. To make things easier, we provide configs for the models and datasets we investigated. Here are a few examples:
1οΈβ£ Evaluate PEGASUS with the "deletion"
perturbation and "random"
document selection strategy, perturbing 10% of input documents on the Multi-News dataset
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/multinews/pegasus/eval.yml" \
output_dir="./output/multinews/pegasus/perturbations/deletion/random/0.10" \
perturbation="deletion" \
selection_strategy="random" \
perturbed_frac 0.10
2οΈβ£ Evaluate PRIMERA with the "addition"
perturbation and "oracle"
strategy, perturbing 50% of input documents on the Multi-XScience dataset
python ./scripts/run_summarization.py "./conf/base.yml" "./conf/multixscience/primera/eval.yml" \
output_dir="./output/multixscience/primera/perturbations/addition/oracle/0.50" \
perturbation="addition" \
selection_strategy="oracle" \
perturbed_frac 0.50
Other experiments can be crafted by modifying the arguments accordingly.
-
To avoid duplicate computation, some perturbations (like backtranslation) will cache their results. You can get this path by calling
python -c "from open_mds.common import util ; print(util.CACHE_DIR)"