- Task of translating natural language queries into regular expressions without using domain specific knowledge.
- Proposes a methodology for collecting a large corpus of regular expressions to natural language pairs.
- Reports performance gain of 19.6% over state-of-the-art models.
- Link to the paper
- LSTM based sequence to sequence neural network (with attention)
- Six layers
- One-word embedding layer
- Two encoder layers
- Two decoder layers
- One dense output layer.
- Attention over encoder layer.
- Dropout with the probability of 0.25.
- 20 epochs, minibatch size of 32 and learning rate of 1 (with decay rate of 0.5)
- Created a public dataset - NL-RX - with 10K pair of (regular expression, natural language)
- Two step generate-and-paraphrase approach
- Generate step
- Use handcrafted grammar to translate regular expressions to natural language.
- Paraphrase step
- Crowdsourcing the task of translating the rigid descriptions into more natural expressions.
- Generate step
- Evaluation Metric
- Functional equality check (called DFA-Equal) as same regular expression could be written in many ways.
- Proposed architecture outperforms both the baselines - Nearest Neighbor classifier using Bag of Words (BoWNN) and Semantic-Unify