RetrieveOrGenerated

This is the code repository for the ACL2024 paper "Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?"

Quickly get the Context-Conflicting datasets (CC) and analysis result.

Environment: pip install -r requirements.txt Also need to clone llama-2 official inference code git clone -b llama_v2 https://github.com/meta-llama/llama.git into the project path.
Download prepared context and intermediate results to the project directory. link
Build the CC dataset and calculate DiffGR. python construct_cc_and_analyse.py

The resulting DiffGR matches Table 5, depicting the bias after excluding the effects of parametric knowledge. By adjusting the parameters in construct_cc_and_analyse.py, you can obtain datasets with different combinations of readers and generators (7b-chat, 13b-chat, gpt-4-0613 or gpt-3.5-turbo-0613).

Reader	Generator	Dataset	DiffGR
13b-chat	13b-chat	NQ-AIR	0.5785
13b-chat	13b-chat	NQ-AIG	0.9012
13b-chat	13b-chat	TQA-AIR	0.6069
13b-chat	13b-chat	TQA-AIG	0.8968

Building the CC Dataset from Scratch

First, we need to prepare the retrieved context and generated context and ensure their lengths match, following scripts/prepare_contexts.md.
Next, LLM answers questions based on different contexts, following scripts/answer_with_different_contexts.md.
- Without context: To determine LLM's parametric knowledge.
- Using only one type of context (generated/retrieved): To determine what each type of context provides.
- Using both generated and retrieved context simultaneously: To analyze which context LLM relies on.
Construct the CC dataset and analyze LLM preference: python construct_cc_and_analyse.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
llama		llama
llama_model		llama_model
scripts		scripts
source		source
src		src
.gitignore		.gitignore
README.md		README.md
construct_cc_and_analyse.py		construct_cc_and_analyse.py
find_gen_ctx_length_similar_to_ret_cxt.py		find_gen_ctx_length_similar_to_ret_cxt.py
generate.py		generate.py
read.py		read.py
requirements.txt		requirements.txt
tokenizer.model		tokenizer.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetrieveOrGenerated

Quickly get the Context-Conflicting datasets (CC) and analysis result.

Building the CC Dataset from Scratch

About

Releases

Packages

Languages

Tan-Hexiang/RetrieveOrGenerated

Folders and files

Latest commit

History

Repository files navigation

RetrieveOrGenerated

Quickly get the Context-Conflicting datasets (CC) and analysis result.

Building the CC Dataset from Scratch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages