ReaSCAN: Compositional Reasoning in Language Grounding

ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.

Citation

Zhengxuan Wu, Elisa Kreiss, Desmond C. Ong, and Christopher Potts. 2020. ReaSCAN: Compositional Reasoning in Language Grounding. Ms., Stanford University.

  @article{wu-etal-2020-dynasent,
    title={{ReaSCAN}: Compositional Reasoning in Language Grounding},
    author={Wu, Zhengxuan and Kreiss, Elisa and Ong, Desmond C. and Potts, Christopher},
    journal={},
    url={},
    year={2021}}

Example

Four command-world pairs for different command patterns. Our simple command is equivalent to gSCAN. RD means distractors are randomly sampled. Referent targets shaded in red with distractors are shaded in blue, and are highlighted by green dash lines.

Dataset

Off-the-shelf ReaSCAN

We generated ReaSCAN using our pipeline with fixed random seeds. You can reproduce the version of ReaSCAN we use in the paper by running the pipeline. Additionally, we also update the version we use to a online folder where you can directly download and use as-it-is. Note that, the dataset files are really large. It may take a while to download them.

Our generated data is in ReaSCAN-v1.0.zip, which is saved in a shared drive. The dataset consists subsets generated for different patterns (P1: Simple (similar to gSCAN), P2: 1-relative-clause, P3: 2-relative-clauses, P4: 3-relative-clauses) and different compositional splits (see our paper for details about each split).

By patterns,

ReaSCAN-compositional: ReaSCAN all commands, containing train, dev and test sets.
ReaSCAN-compositional-p1: ReaSCAN Simple set, containing train, dev and test sets.
ReaSCAN-compositional-p2: ReaSCAN 1-relative-clause set, containing train, dev and test sets.
ReaSCAN-compositional-p3: ReaSCAN 2-relative-clauses set, containing train, dev and test sets.
ReaSCAN-compositional-p1-test: ReaSCAN Simple set, containing test set only.
ReaSCAN-compositional-p2-test: ReaSCAN 1-relative-clause set, containing test set only.
ReaSCAN-compositional-p3-test: ReaSCAN 2-relative-clauses set, containing test set only.

By splits,

ReaSCAN-compositional-a1: ReaSCAN A1 compositional split, containing test set only.
ReaSCAN-compositional-a2: ReaSCAN A2dataset compositional split, containing test set only.
ReaSCAN-compositional-a3: ReaSCAN A3 compositional split, containing test set only.
ReaSCAN-compositional-b1: [WARNING] This split is deprecated! Do not use!
ReaSCAN-compositional-b2: ReaSCAN B compositional split, containing test set only.
ReaSCAN-compositional-c: ReaSCAN C compositional split, containing test set only.

Note that our A1 and A2 is similar to gSCAN's split B and C in the setup but different in what they are testing. We plan to support the splits in gSCAN in future releases as well. In fact, you can also generate your own compositional splits by modifying couple lines in code/dataset/generate_ReaSCAN_splits.ipynb.

Special split,

ReaSCAN-compositional-p3-rd: ReaSCAN 2-relative-clauses set with random distractors, containing train, dev and test sets.
ReaSCAN-compositional-p4 or ReaSCAN-compositional-p4-test: ReaSCAN 3-relative-clauses set only, containing test set only.

Regenerate ReaSCAN

You can recreate ReaSCAN shared above using provided scripts. Since generating a full-fleged dataset can take long, you can use our multi-process generator which can generate any subset included in our paper within 20 mininutes with 50 processes. Here are some example code we used to generate 2-relative-clauses set dataset. For exact scripts we use to generate our dataset used in the paper, you can refer to code/experiments.sh.

Single process generation,

cd code/dataset

python generate_ReaSCAN.py \
--mode train \
--n_command_struct 100 \
--date 2021-05-30 \
--grid_size 6 \
--n_object_max 13 \
--per_command_world_retry_max 500 \
--per_command_world_target_count 3 \
--output_dir ./ReaSCAN-compositional-demo/ \
--include_relation_distractor \
--include_attribute_distractor \
--include_isomorphism_distractor \
--include_random_distractor \
--full_relation_probability 1.0 \
--command_pattern p3 \
--save_interal 200

Multi-process generation,

cd code/dataset

python generate_ReaSCAN_batch.py

Note that you need to go into the file and modify some variables to generate the dataset you want.

Dataset format

Loading ReaSCAN

Once you generate the dataset .txt file (in json format), you can simply load any dataset as,

import json

path_to_data = "data-compositional-splits.txt"
logger.info(f"Reading dataset from file: {p1_path_to_data}...")
data_json = json.load(open(path_to_data, "r"))

print(data_json["examples"].keys())

We keep our format the same as gSCAN. For each example, we provide the command and the world representation. Additionally, we provide ReaSCAN specific metadata,

The first data example in the split called ReaSCAN-compositional-p3-test set. Click to open/close.

{
                "command": "pull,a,small,object,that,is,in,the,same,column,as,a,green,cylinder,and,in,the,same,shape,as,a,small,red,object,cautiously",
                "grammer_pattern": "$OBJ_0 ^ $OBJ_1 & $OBJ_2",
                "meaning": "pull,a,small,object,that,is,in,the,same,column,as,a,green,cylinder,and,in,the,same,shape,as,a,small,red,object,cautiously",
                "derivation": "$OBJ_0 ^ $OBJ_1 & $OBJ_2",
                "situation": {
                    "grid_size": 6,
                    "agent_position": {
                        "row": "1",
                        "column": "1"
                    },
                    "agent_direction": 0,
                    "target_object": {
                        "vector": "010010000001",
                        "position": {
                            "row": "2",
                            "column": "3"
                        },
                        "object": {
                            "shape": "circle",
                            "color": "yellow",
                            "size": "2"
                        }
                    },
                    "distance_to_target": "3",
                    "direction_to_target": "se",
                    "placed_objects": {
                        "0": {
                            "vector": "010010000001",
                            "position": {
                                "row": "2",
                                "column": "3"
                            },
                            "object": {
                                "shape": "circle",
                                "color": "yellow",
                                "size": "2"
                            }
                        },
                        "1": {
                            "vector": "001001000010",
                            "position": {
                                "row": "0",
                                "column": "3"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "green",
                                "size": "3"
                            }
                        },
                        "2": {
                            "vector": "010010001000",
                            "position": {
                                "row": "3",
                                "column": "0"
                            },
                            "object": {
                                "shape": "circle",
                                "color": "red",
                                "size": "2"
                            }
                        },
                        "3": {
                            "vector": "100000100100",
                            "position": {
                                "row": "3",
                                "column": "2"
                            },
                            "object": {
                                "shape": "square",
                                "color": "blue",
                                "size": "1"
                            }
                        },
                        "4": {
                            "vector": "010010001000",
                            "position": {
                                "row": "5",
                                "column": "5"
                            },
                            "object": {
                                "shape": "circle",
                                "color": "red",
                                "size": "2"
                            }
                        },
                        "5": {
                            "vector": "100001001000",
                            "position": {
                                "row": "3",
                                "column": "4"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "red",
                                "size": "1"
                            }
                        },
                        "6": {
                            "vector": "001001000010",
                            "position": {
                                "row": "0",
                                "column": "4"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "green",
                                "size": "3"
                            }
                        },
                        "7": {
                            "vector": "010000101000",
                            "position": {
                                "row": "4",
                                "column": "3"
                            },
                            "object": {
                                "shape": "square",
                                "color": "red",
                                "size": "2"
                            }
                        },
                        "8": {
                            "vector": "010001000001",
                            "position": {
                                "row": "1",
                                "column": "3"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "yellow",
                                "size": "2"
                            }
                        },
                        "9": {
                            "vector": "100001001000",
                            "position": {
                                "row": "1",
                                "column": "5"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "red",
                                "size": "1"
                            }
                        },
                        "10": {
                            "vector": "001010001000",
                            "position": {
                                "row": "3",
                                "column": "5"
                            },
                            "object": {
                                "shape": "circle",
                                "color": "red",
                                "size": "3"
                            }
                        },
                        "11": {
                            "vector": "001010001000",
                            "position": {
                                "row": "0",
                                "column": "1"
                            },
                            "object": {
                                "shape": "circle",
                                "color": "red",
                                "size": "3"
                            }
                        },
                        "12": {
                            "vector": "001001000001",
                            "position": {
                                "row": "5",
                                "column": "0"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "yellow",
                                "size": "3"
                            }
                        },
                        "13": {
                            "vector": "001000100010",
                            "position": {
                                "row": "0",
                                "column": "0"
                            },
                            "object": {
                                "shape": "square",
                                "color": "green",
                                "size": "3"
                            }
                        },
                        "14": {
                            "vector": "100001001000",
                            "position": {
                                "row": "5",
                                "column": "1"
                            },
                            "object": {
                                "shape": "cylinder",
                                "color": "red",
                                "size": "1"
                            }
                        }
                    },
                    "carrying_object": null
                },
                "target_commands": "turn left,turn right,turn right,turn left,walk,turn left,turn right,turn right,turn left,walk,turn right,turn left,turn right,turn right,turn left,walk",
                "verb_in_command": "pull",
                "adverb_in_command": "cautiously",
                "referred_target": "small object",
                "object_pattern_map": {
                    "$OBJ_0": "$SIZE $ABS_SHAPE",
                    "$OBJ_1": "$COLOR $SHAPE",
                    "$OBJ_2": "$SIZE $COLOR $ABS_SHAPE"
                },
                "relation_map": [
                    [
                        [
                            "$OBJ_0",
                            "$OBJ_1"
                        ],
                        "$SAME_COLUMN"
                    ],
                    [
                        [
                            "$OBJ_0",
                            "$OBJ_2"
                        ],
                        "$SAME_SHAPE"
                    ]
                ],
                "object_expression": {
                    "$OBJ_0": "small object",
                    "$OBJ_1": "green cylinder",
                    "$OBJ_2": "small red object"
                },
                "n_object": 15,
                "n_distractor": 12,
                "full_relation_distractor": true,
                "has_relation_distractor": true,
                "has_attribute_distractor": true,
                "has_isomorphism_distractor": true,
                "has_random_distractor": false,
                "n_random_distractor": 0,
                "relation_distractor_metadata": [
                    {
                        "distractor_metadata": {
                            "edge": [
                                "$OBJ_0",
                                "$OBJ_1"
                            ],
                            "relation_old_type": "$SAME_COLUMN",
                            "full_set": true
                        }
                    },
                    {
                        "distractor_metadata": {
                            "edge": [
                                "$OBJ_0",
                                "$OBJ_2"
                            ],
                            "relation_old_type": "$SAME_SHAPE",
                            "full_set": true
                        }
                    }
                ],
                "attribute_distractor_metadata": [
                    {
                        "distractor_metadata": [
                            {
                                "modified_obj": "$OBJ_1",
                                "modified_attribute": "$COLOR"
                            }
                        ]
                    }
                ],
                "isomorphism_distractor_metadata": [
                    {
                        "distractor_metadata": [
                            {
                                "swapped_pair": [
                                    "$OBJ_1",
                                    "$OBJ_2"
                                ],
                                "before_pair_obj_str": [
                                    "green cylinder",
                                    "small red object"
                                ],
                                "after_pair_obj_str": [
                                    "small green object",
                                    "red cylinder"
                                ],
                                "size_shuffled": true,
                                "color_shuffled": false,
                                "shape_shuffled": true
                            }
                        ]
                    }
                ],
                "random_distractor_metadata": [
                    {}
                ]
            }

This is one example from this dataset. It contains the "command", or input instruction, 'pull,a,small,object,that,is,in,the,same,column,as,a,green,cylinder,and,in,the,same,shape,as,a,small,red,object,cautiously' separated by ,, which for the specified world state (i.e., "situation") maps to the "target_commands": "turn left,turn right,turn right,turn left,walk,turn left,turn right,turn right,turn left,walk,turn right,turn left,turn right,turn right,turn left,walk". The example contains the situation representation, or world state, at the key "situation", and also contains additional information that is needed in generating the world for example what are our distractors made of, such as fields in the relation_distractor_metadata.

To be more compatiable with other models, we also provide a translation script that can translate each exmaple into a compressed dictionary containing all the information needed to train a neural model (i.e., input: a command sequence + tensor representation of a shape world, output: an output action sequence are all you need.). To convert, you can refer the following script,

cd code/models/gSCAN_with_language_conditioned_embedding

jupyter notebook

# open this file: read_reascan.ipynb

Following steps in this script, each example will be translated to a data structure like,

Compact version of ReaSCAN that is ready-to-use by any neural models. Click to open/close.

{"input": ["pull", "a", "big", "yellow", "square", "that", "is", "in", "the", "same", "row", "as", "a", "small", "blue", "circle", "and", "in", "the", "same", "column", "as", "a", "big", "green", "cylinder"], "target": ["turn left", "turn left", "walk", "walk", "walk", "walk", "turn left", "walk", "walk", "pull", "pull", "pull", "pull"], "situation": [[[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], [[0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]]]}

Note that the situation is a tensor representation of the shape world. Each sub-list is the representation of each cell in the world. It encodes what object is in what position based on the following information,

"""
Each grid cell in a situation is fully specified by a vector:
[_ _ _ _ _ _ _   _       _      _      _    _   _ _ _ _]
 1 2 3 4 r g b circle square cylinder box agent E S W N
 _______ _____ ______________________ _____ _______
   size  color        shape           agent agent dir.
:param situation_representation: data from dataset.txt at key "situation".
:param grid_size: int determining row/column number.
:return: grid to be parsed by computational models.
"""

In case, if there are overlayed objects in a single cell, we add them together. This is only for a object that is inside of the box if the object is at the upper left corner. There are many other ways to represent this situation, but we take the simplest approach.

Models

We use two existing models, and adapt their codes to benchmark ReaSCAN. Both models are published and experimented on gSCAN. Other than hyperparameter tunning, we are not changing model architectures.

Multimodal LSTM

This model is published with gSCAN in this paper from this repo. You can refer to their repo for details about the model. Here, we already adapt interface changes that are needed to run with ReaSCAN, you can simply run training with following lines,

cd code/models/seq2seq

CUDA_VISIBLE_DEVICES=0 python run_reascan.py \
--mode=train \
--max_decoding_steps=120 \
--max_testing_examples=2000 \
--data_directory=ReaSCAN-compositional-p1 \
--input_vocab_path=input_vocabulary.txt \
--target_vocab_path=target_vocabulary.txt \
--attention_type=bahdanau \
--no_auxiliary_task \
--conditional_attention \
--output_directory=./training_logs/p1-random-seed-44 \
--training_batch_size=2000 \
--max_training_iterations=200000 \
--seed=44

Note that this requires you generate the vocabulary file before hand to save time. You can do so by following scripts provided in the notebook ReaSCAN-vocab-generator.ipynb in the same folder.

To evaluate this model, you need to run evaluation script and generate all predictions. Note that we follow the original repo, and you can refer to their code for your own implementations. This is the script we run,

cd code/models/seq2seq

CUDA_VISIBLE_DEVICES=0 python run_reascan.py \
 --mode=test \
 --data_directory=../../../data-files-updated/ReaSCAN-compositional-p1/ \
 --input_vocab_path=input_vocabulary.txt \
 --target_vocab_path=target_vocabulary.txt \
 --attention_type=bahdanau \
 --no_auxiliary_task \
 --conditional_attention \
 --output_directory=../../../testing_logs/p1-random-seed-44/  \
 --resume_from_file=../../../training_logs/p1-random-seed-44/model_best.pth.tar \
 --splits=dev \
 --output_file_name=p1-random-seed-44.json \
 --max_decoding_steps=120

Note that this is for --splits=dev, you can change to --splits=test if you want to evaluate with test splits.

After this script, it will generate predictions in the file in the output directory. Then, you can use our notebook to analyze the results by running the notebook performance-analysis.ipynb in the model folder!

GCN + LSTM

This model is published with gSCAN in this paper from this repo. You can refer to their repo for details about the model. Here, we already adapt interface changes that are needed to run with ReaSCAN, you can simply run training with following lines,

cd code/models/gSCAN_with_language_conditioned_embedding

CUDA_VISIBLE_DEVICES=0 python main_model.py \
--run p1-random-seed-66 \
--data_dir ./parsed_dataset-p1/ \
--seed 44 \
--txt

Note that the script above assumed that you already parse the dataset following the parsing helpers provided in the notebook read_reascan.ipynb.

After running this script, all models will be saved in the directory folder. Then, you can evaluate performance of this model using scripts as,

cd code/models/gSCAN_with_language_conditioned_embedding

CUDA_VISIBLE_DEVICES=0 python eval_best_model.py \
--load ./output/p1-random-seed-44/model_best.pth.tar \
--data_dir ./parsed_dataset-p1/ \
--seed 44 \
--test_split dev

Note that this is for --test_split=dev, you can change to --test_split=test if you want to evaluate with test splits.

Other files

In this repo, we also provide a lot of useful scripts to analyze ReaSCAN in various ways. Here are a non-comprehensive list of them with their purposes,

code/models/seq2seq/performance-analysis.ipynb: evaluate model performance.
code/models/seq2seq/ReaSCAN-vocab-generator.ipynb: generate required vocab files.
code/models/gSCAN_with_language_conditioned_embedding/read_reascan.ipynb: helper to parse the dataset into model readable format.
code/experiments.sh: all bash scripts we run for our experiment results.
code/dataset/demo.ipynb: demo file for all components involved in ReaSCAN data generation process.
code/dataset/unit_tests.ipynb: unit tests for ReaSCAN. If you want to customized ReaSCAN, please run this unit test before changing anything.
code/dataset/generate_ReaSCAN_splits.ipynb: generate splits for ReaSCAN.
code/dataset/ReaSCAN-analysis.ipynb: some analyses we conduct in the paper.

ReaSCAN as Abstract Reasoning Challenges

Two simplified abstract reasoning challenges with ReaSCAN. The task mimics human reasoning test where giving a set of input-output (input on the left and output on the right) pairs, the task taker needs to guess the output for the last input. For each task, we provide one potential abstract reasoning to solve the task.

License

ReaSCAN has a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
code		code
experiments		experiments
multimodal_seq2seq_gSCAN		multimodal_seq2seq_gSCAN
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReaSCAN: Compositional Reasoning in Language Grounding

Contents

Citation

Example

Dataset

Off-the-shelf ReaSCAN

Regenerate ReaSCAN

Dataset format

Loading ReaSCAN

Models

Multimodal LSTM

GCN + LSTM

Other files

ReaSCAN as Abstract Reasoning Challenges

License

About

Releases 1

Packages

Contributors 2

Languages

License

frankaging/Reason-SCAN

Folders and files

Latest commit

History

Repository files navigation

ReaSCAN: Compositional Reasoning in Language Grounding

Contents

Citation

Example

Dataset

Off-the-shelf ReaSCAN

Regenerate ReaSCAN

Dataset format

Loading ReaSCAN

Models

Multimodal LSTM

GCN + LSTM

Other files

ReaSCAN as Abstract Reasoning Challenges

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages