Towards Enriched Controllability for Educational Question Generation

===============

Sample source code and models for our AIED 2023 paper: Towards Enriched Controllability for Educational Question Generation

Abstract: Question Generation (QG) is a task within Natural Language Processing (NLP) that involves automatically generating questions given an input, typically composed of a text and a target answer. Recent work on QG aims to control the type of generated questions so that they meet educational needs. A remarkable example of controllability in educational QG is the generation of questions underlying certain narrative elements, e.g., causal relationship, outcome resolution, or prediction. This study aims to enrich controllability in QG by introducing a new guidance attribute: question explicitness. We propose to control the generation of explicit and implicit wh-questions from children-friendly stories. We show preliminary evidence of controlling QG via question explicitness alone and simultaneously with another target attribute: the question's narrative element.

Authors: Bernardo Leite, Henrique Lopes Cardoso

Illustrative Example

Main Features

Training, inference and evaluation scripts for controllable QG
Fine-tuned QG T5 models for controllable QG (to be done)

Prerequisites

Python 3 (tested with version 3.9.13 on Windows 10)

Installation and Configuration

Clone this project:

git clone https://github.com/bernardoleite/question-generation-control

Install the Python packages from requirements.txt. If you are using a virtual environment for Python package management, you can install all python packages needed by using the following bash command:
```
cd question-generation-control/
pip install -r requirements.txt
```

Usage

You can use this code for data preparation, training, inference/predicting and evaluation.

Data preparation

Current experiments use the FairytaleQA dataset. So the next steps are specifically intended to preparing this dataset.

Example for preparing the (original) FairytaleQA dataset:

Create FairytaleQA_Dataset folder inside data folder
Download the files and folders from here and place them inside data/FairytaleQA_Dataset folder
Run src/data/do_splits_processed_gen.py (data prepared for QA and QG baselines, see paper)
Check if data/splits_processed_gen/ folder has been created
Run src/data/do_splits_processed_ctrl_sk_a.py (data prepared for controlled train/dev/test, see paper)
Check if data/processed_ctrl_sk_a/ folder has been created

Training: Example for question narrative/skill control

Go to src/model. The file train.py is responsible for the training routine. Type the following command to read the description of the parameters:

python train.py -h

You can also run the example training script (linux and mac) train_qg_t5_base_512_128_32_10_skill-text_question-answer_seed_44.sh:

bash train_qg_t5_base_512_128_32_10_skill-text_question-answer_seed_44.sh

The previous script will start the training routine with predefined parameters:

#!/usr/bin/env bash

for ((i=44; i <= 44; i++))
do
    taskset --cpu-list 1-24 python train.py \
    --dir_model_name "qg_t5_base_512_128_32_10_skill-text_question-answer_seed_${i}" \
    --model_name "t5-base" \
    --tokenizer_name "t5-base" \
    --train_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/train.json" \
    --val_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/val.json" \
    --test_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/test.json" \
    --max_len_input 512 \
    --max_len_output 128 \
    --encoder_info "skill_text" \
    --decoder_info "question_answer" \
    --batch_size 32 \
    --max_epochs 10 \
    --patience 2 \
    --optimizer "AdamW" \
    --learning_rate 0.0001 \
    --epsilon 0.000001 \
    --num_gpus 1 \
    --seed_value ${i}
done

In the end, model checkpoints will be available at checkpoints/checkpoint-name.

Training: Example for question explicitness/answertype control

Go to src/model. The file train.py is responsible for the training routine. Type the following command to read the description of the parameters:

python train.py -h

You can also run the example training script (linux and mac) train_qg_t5_base_512_128_32_10_answertype-text_question-answer_seed_44.sh:

bash train_qg_t5_base_512_128_32_10_answertype-text_question-answer_seed_44.sh

The previous script will start the training routine with predefined parameters:

#!/usr/bin/env bash

for ((i=44; i <= 44; i++))
do
    taskset --cpu-list 1-24 python train.py \
    --dir_model_name "qg_t5_base_512_128_32_10_answertype-text_question-answer_seed_${i}" \
    --model_name "t5-base" \
    --tokenizer_name "t5-base" \
    --train_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/train.json" \
    --val_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/val.json" \
    --test_path "../../data/FairytaleQA_Dataset/processed_ctrl_sk_a/test.json" \
    --max_len_input 512 \
    --max_len_output 128 \
    --encoder_info "answertype_text" \
    --decoder_info "question_answer" \
    --batch_size 32 \
    --max_epochs 10 \
    --patience 2 \
    --optimizer "AdamW" \
    --learning_rate 0.0001 \
    --epsilon 0.000001 \
    --num_gpus 1 \
    --seed_value ${i}
done

In the end, model checkpoints will be available at checkpoints/checkpoint-name.

Note: You can change encoder_info parameter as follows:

skill_text: control question narrative elements
answertype_text: control question explicitness
skill_answertype_text: control question explicitness and narrative elements (same time)

Inference: Example for question narrative/skill

Go to src/model. The script file inference_qg_t5_base_512_128_32_10_skill-text_question-answer_seed_44.sh is an example for the inference routine. The predictions will be available at predictions/checkpoint-name. The folder contains model predictions (predictions.json), and parameters (params.json).

Important note: Replace XX and YY according to epoch number and loss from checkpoint_model_path and predictions_save_path parameters.

Inference: Example for question explicitness/answertype

Go to src/model. The script file inference_qg_t5_base_512_128_32_10_answertype-text_question-answer_seed_44.sh will generate both questions and answers
You need to create a QA system for answering the generated questions. Go to src/model and run train_qa.sh
Go to src/model. The script file inference_qa_questiongen_t5_base_512_128_32_10_answertype-text_question-answer_seed_44.sh is for making the QA system answer the generated questions.

Important note: Replace XX, YY, KK and ZZ according to epoch number and loss from checkpoint_model_path and predictions_save_path parameters.

Evaluation (Question Generation)

For QG evaluation, you first need to install/configure Rouge and BLEURT
Go to src/eval-qg.py file
See preds_path list and choose (remove or add) additional predictions. Current predictions are the ones reported in the article
Run src/eval-qg.py to computer evaluation scores Note: Our experiments showed that BLEURT took too long to compute the scores. For these reasons, we have commented on the code's computation and output of BLEURT values. If you still want to compute BLEURT, update bleurt_checkpoint and uncomment the BLEURT lines.

Evaluation (Question Answering)

For QA evaluation, you first need to install/configure Rouge
Go to src/eval-qa.py file
See preds_path list and choose (remove or add) additional predictions. Current predictions are the ones reported in the article
Run src/eval-qa.py to computer evaluation scores

Issues and Usage Q&A

To ask questions, report issues or request features, please use the GitHub Issue Tracker.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks in advance!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Project

This project is released under the MIT license. For details, please see the file LICENSE in the root directory.

Commercial Purposes

A commercial license may also be available for use in industrial projects, collaborations or distributors of proprietary software that do not wish to use an open-source license. Please contact the author if you are interested.

Acknowledgements

The base code is based on a previous implementation.

Contact

Bernardo Leite, bernardo.leite@fe.up.pt
Henrique Lopes Cardoso, hlc@fe.up.pt

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
images		images
predictions		predictions
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Enriched Controllability for Educational Question Generation

Illustrative Example

Main Features

Prerequisites

Installation and Configuration

Usage

Data preparation

Training: Example for question narrative/skill control

Training: Example for question explicitness/answertype control

Inference: Example for question narrative/skill

Inference: Example for question explicitness/answertype

Evaluation (Question Generation)

Evaluation (Question Answering)

Issues and Usage Q&A

Contributing

License

Project

Commercial Purposes

Acknowledgements

Contact

About

Releases

Packages

Languages

License

bernardoleite/question-generation-control

Folders and files

Latest commit

History

Repository files navigation

Towards Enriched Controllability for Educational Question Generation

Illustrative Example

Main Features

Prerequisites

Installation and Configuration

Usage

Data preparation

Training: Example for question narrative/skill control

Training: Example for question explicitness/answertype control

Inference: Example for question narrative/skill

Inference: Example for question explicitness/answertype

Evaluation (Question Generation)

Evaluation (Question Answering)

Issues and Usage Q&A

Contributing

License

Project

Commercial Purposes

Acknowledgements

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages