evaluation

Required packages

torch               2.0.0
transformers        4.36.2
numpy               1.26.3
tqdm                4.66.1
scikit-learn        1.4.0
rouge_score         0.1.2
nltk                3.8.1
accelerate          0.26.1

For query understanding tasks and document understanding tasks (qu-du-tasks)

This evaluation script use pytorch DDP for text generation.

Download test data and save it to data/in-domain/zero_shot/. The directory structure is like below:

qu-du-tasks
├── eval_sampling.py
├── inference_dataset.py
├── inference_qu_du.py
├── inference_tasks
│   ├── conversational_qa.py
│   ├── fact_verification.py
│   └── ...
└── data
    └── in-domain
        └── zero-shot
            ├── conversational_qa_coqa.zero_shot.test.jsonl
            ├── conversational_qa_quac.zero_shot.test.jsonl
            ├── fact_verification_climate_fever.zero_shot.test.jsonl
            ├── fact_verification_fever.zero_shot.test.jsonl
            ├── fact_verification_scifact.zero_shot.test.jsonl
            └── ...

If you choose to place the test files in other directories, you can modify the path in each task file under inference_tasks directory (in get_path() function).
Run evaluation as

TOKENIZERS_PARALLELISM=True python3 inference_qu_du.py \
    --model_name_or_path your/model/path \
    --tokenizer_name your/tokenizer/path \
    --setting in-domain \
    --n_shots zero_shot

For query-document relationship understanding tasks (qdu-tasks)

Download test data and save it to data/. The directory structure is like below:

qdu-tasks
├── cqa.sh
├── eval_rank.py
├── postprocess_cqa.py
├── run_eval.sh
└── data
    ├── cqadupstack
    │   ├── android
    │   │   └── test.pt.key.do-not-overwrite.json
    │   ├── english
    │   │   └── test.pt.key.do-not-overwrite.json
    │   └── ...
    ├── arguana.bm25.100.jsonl
    ├── climate_fever.bm25.100.jsonl
    └── ...

For datasets other than cqadupstack, modify the paths in run_eval.sh, then run the script

MODEL_PATH="your/model/path"
TOKENIZER_PATH="your/tokenizer/path"
RESULT_PATH="your/result/path"
EVAL_DATA_PATH="data"

-----------------------
bash run_eval.sh

For cqadupstack dataset, modify the paths in cqa.sh, then run the script

MODEL_PATH="your/model/path"
TOKENIZER_PATH="your/tokenizer/path"
RESULT_PATH="your/result/path"

-----------------------
bash cqa.sh

This script supports testing pointwise/pairwise/listwise methods for reranking. Modify the parameter of eval_rerank.py in run_eval.sh or cqa.sh

# pointwise:  (default)
--rerank_method pointwise

# pairwise:
--rerank_method pairwise

# listwise:
--rerank_method listwise \
--listwise_window 5 \
--listwise_stride 5

Name		Name	Last commit message	Last commit date
parent directory ..
qdu-tasks		qdu-tasks
qu-du-tasks		qu-du-tasks
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

readme.md

Required packages

For query understanding tasks and document understanding tasks (qu-du-tasks)

For query-document relationship understanding tasks (qdu-tasks)

Files

evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation

Folders and files

parent directory

readme.md

Required packages

For query understanding tasks and document understanding tasks (qu-du-tasks)

For query-document relationship understanding tasks (qdu-tasks)