Complex Question Decomposition for Semantic Parsing

This is the code base for ACL'19 paper Complex Question Decomposition for Semantic Parsing.

1. Preprocess

1.1 Download raw data

Download ComplexWebQ data, prepare environment and libraries.

1.2 Requirements for preprocess

In order to run preprocess, you should put the following files in DATA_PATH directory, DATA_PATH is defined in the script.

ComplexWebQuestions_train.json
ComplexWebQuestions_dev.json
ComplexWebQuestions_test.json (we need any other information in above files)
train.json, dev.json, test.json (we need the splitted sub questions in these files, it is not included in raw data, but we generate and prepare them for users in complex_questions directory, also you can generate them by following steps)

1.2.1 Generate golden sub question split points by yourself

cd WebAsKB.
Prepare a StanfordCoreNLP server in localhost following 1.2.2.
Change data_dir setting in WebAsKB/config.py.
Change EVALUATION_SET setting in WebAsKB/config.py to train, dev and test, and Run python webaskb_run.py gen_golden_sup for three times.
By this time, you can get train.json, dev.json, test.json in DATA_PATH.

1.2.2 Prepare a StanfordCoreNLP server

In order to run the POS annotation process, you should download and start a StanfordCoreNLP server in localhost:9003.

Download from https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip, unzip and cd to it.
Start server using java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9003 -timeout 15000.

1.3 Modify script

We provide a template script scripts/run.sh for the users, and you need to change the following directory settings at least to run it.

DATA_PATH: where the data root directory is.
RUN_T2T: the root directory of the code base.

1.3 Run Preprocess

Now run scripts/run.sh preprocess, the command will generate the data format for our model, and annotate POS labels.

2. Prepare

Prepare Glove pretrained embedding file glove.6B.300d.txt and put it in DATA_PATH/embed/.

scripts/run.sh prepare will shuffle the dataset and build vocabulary file.

3. Train

To train our decompose model, use scripts/run.sh train.

To train our semantic parsing model, use scripts/run.sh train_lf.

4. Test

scripts/run.sh test, it will generate decomposed query with a input file, and print bleu-4 & rouge-l score compared to references.

scripts/run.sh test_lf, it will generate logical form with a input file, and print EM score compared to references.

Citation

If you use this code in your research, please kindly cite our paper via the following BibTeX.

@inproceedings{Zhang2019HSP,
author = {Zhang, Haoyu and Cai, Jingjing and Xu, Jianjun and Wang, Ji},
booktitle = {Conference of the Association for Computational Linguistics (ACL)},
year = {2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
WebAsKB		WebAsKB
complex_web_questions		complex_web_questions
data		data
decomp_models		decomp_models
prepare		prepare
preprocess		preprocess
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Complex Question Decomposition for Semantic Parsing

1. Preprocess

1.1 Download raw data

1.2 Requirements for preprocess

1.2.1 Generate golden sub question split points by yourself

1.2.2 Prepare a StanfordCoreNLP server

1.3 Modify script

1.3 Run Preprocess

2. Prepare

3. Train

4. Test

Citation

About

Releases

Packages

Contributors 2

Languages

cairoHy/HSP

Folders and files

Latest commit

History

Repository files navigation

Complex Question Decomposition for Semantic Parsing

1. Preprocess

1.1 Download raw data

1.2 Requirements for preprocess

1.2.1 Generate golden sub question split points by yourself

1.2.2 Prepare a StanfordCoreNLP server

1.3 Modify script

1.3 Run Preprocess

2. Prepare

3. Train

4. Test

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages