ABFS

Code release and supplementary materials for:
"ABFS: Natural Robustness Testing for LLM-based NLP Software"

An example of robustness flaws in LLM-based NLP software

Slightly perturbed text can mislead ChatGPT o1-preview into judging the label of financial news from "POSITIVE" (with a confidence of 95%) to "NEGATIVE" (with a confidence of 70%).

Datesets

There are three datasets used in our experiments:

Repo structure

datasets: define the dataset object used for carrying out tests
goal_functions: determine if the testing method generates successful test cases
search_methods: explore the space of transformations and try to locate a successful perturbation
transformations: transform the input text, e.g. synonym replacement
constraints: determine whether or not a given transformation is valid

The most important files in this project are as follows:

goal_functions/classification/untargeted_llm_classification.py: quantify the goal of testing LLM-based NLP software in text classification task
search_methods/best_first_word_swap_wir.py: search test cases based on adaptive best-first search
inference.py: drive threat LLMs to do inference and process outputs
abfs_fi_llama270b.py: an example of testing Llama-2-70b-chat on the Financial Phrasebank dataset via ABFS

Dependencies

The code was tested with:

bert-score>=0.3.5
autocorrect==2.6.1
accelerate==0.25.0
datasets==2.15.0
nltk==3.8.1
openai==1.3.7
sentencepiece==0.1.99
tokenizers==0.15.0
torch==2.1.1
tqdm==4.66.1
transformers==4.38.0
Pillow==10.3.0
transformers_stream_generator==0.0.5
matplotlib==3.8.3
tiktoken==0.6.0

How to Run:

Follow these steps to run the attack from the library:

Fork this repository
Run the following command to install it.
```
$ pip install -e . ".[dev]"
```
Run the following command to test Llama-2-70b-chat on the Financial Phrasebank dataset via ABFS.
```
$ python abfs_fi_llama270b.py
```

Take a look at the Models directory in Hugging Face to run the test across any threat model.

License

This code and model are available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.

Acknowledgement

This code is based on the TextAttack framework.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
attack_recipes		attack_recipes
attack_results		attack_results
augmentation		augmentation
commands		commands
constraints		constraints
datasets		datasets
goal_function_results		goal_function_results
goal_functions		goal_functions
images		images
loggers		loggers
metrics		metrics
models		models
prompt_augmentation		prompt_augmentation
search_methods		search_methods
shared		shared
stresstest		stresstest
transformations		transformations
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
abfs_fi_llama270b.py		abfs_fi_llama270b.py
attack.py		attack.py
attack_args.py		attack_args.py
attacker.py		attacker.py
augment_args.py		augment_args.py
dataset_args.py		dataset_args.py
inference.py		inference.py
model_args.py		model_args.py
trainer.py		trainer.py
training_args.py		training_args.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABFS

An example of robustness flaws in LLM-based NLP software

Datesets

Repo structure

Dependencies

How to Run:

License

Acknowledgement

About

Releases

Packages

Languages

License

lumos-xiao/ABFS

Folders and files

Latest commit

History

Repository files navigation

ABFS

An example of robustness flaws in LLM-based NLP software

Datesets

Repo structure

Dependencies

How to Run:

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages