This repository includes code for the paper Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. It is built on top of code from the MEMIT repository here.
For needed packages, first create a virtual environment via python3 -m venv env
and activate it (source env/bin/activate
).
Then, install an appropriate version of torch
for your system. Next, install the remaining requirements:
cd third_party
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"
We gather causal tracing results from the first 2000 points in the CounterFact dataset, filtering to 652 correctly completed prompts when using GPT-J. The window_sizes
argument controls which tracing window sizes to use. To reproduce all GPT-J results in the paper, run tracing experiments with for window sizes 10, 5, 3, and 1. This can be done with the following steps.
First, set the global variables in experiments/tracing.py
(i.e. CODE_DIR
, BASE_DIR
, and MODEL_DIR
) to desired values. Then, run:
python -m experiments.tracing \
-n 2000 \
--ds_name counterfact \
--model_name EleutherAI/gpt-j-6B \
--run 1 \
--window_sizes "10 5 3 1"
To get results for ZSRE, run:
python -m experiments.tracing \
-n 2000 \
--ds_name zsre \
--model_name EleutherAI/gpt-j-6B \
--run 1 \
--window_sizes "5"
python -m experiments.tracing \
-n 2000 \
--ds_name zsre \
--model_name gpt2-xl \
--run 1 \
--window_sizes "5" \
--gpu 1
We check the relationship between causal tracing localization and editing performance using several editing methods applied to five different variants of the basic model editing problem. The editing methods are:
- Constrained finetuning with Adam at one layer
- Constrained finetuning with Adam at five adjacent layers
- ROME (which edits one layer)
- MEMIT (which edits five layers)
The editing problems include the original model editing problem specified by the CounterFact dataset (changing the prediction for a given input), as well as a few variants mentioned below.
To run the default Error Injection editing problem using ROME with GPT-J, first set the global variabes in experiments/evaluate.py
(i.e. CODE_DIR
, BASE_DIR
, and MODEL_DIR
) to desired values. Then, run:
python3 -m experiments.evaluate \
-n 2000 \
--alg_name ROME \
--window_sizes "1" \
--ds_name cf \
--model_name EleutherAI/gpt-j-6B \
--run 1 \
--edit_layer -2 \
--correctness_filter 1 \
--norm_constraint 1e-4 \
--kl_factor 1 \
--fact_token subject_last
To run an experiment with ZSRE, use:
python3 -m experiments.evaluate \
-n 2000 \
--alg_name ROME \
--window_sizes "1" \
--ds_name zsre \
--model_name EleutherAI/gpt-j-6B \
--run 1 \
--edit_layer 5 \
--correctness_filter 0 \
--norm_constraint 1e-4 \
--kl_factor 1 \
--fact_token subject_last
Add the following flags for each variation of the experiments:
- Error Injection: no flag
- Tracing Reversal:
--tracing_reversal
- Fact Erasure:
--fact_erasure
- Fact Amplification:
--fact_amplification
- Fact Forcing:
--fact_forcing
For example, to run with constrained finetuning across 5 layers in order to do Fact Erasure, run:
python3 -m experiments.evaluate \
-n 2000 \
--alg_name FT \
--window_sizes "5" \
--ds_name cf \
--model_name EleutherAI/gpt-j-6B \
--run 1 \
--edit_layer -2 \
--correctness_filter 1 \
--norm_constraint 1e-4 \
--kl_factor .0625 \
--fact_erasure
Data analysis for this work is done in R via the data_analysis.ipynb
file. All plots and regression analyses in the paper can be reproduced via this file.
This is not an officially supported Google product.