Skip to content

trishullab/bayou

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayou

Bayou is a data-driven program synthesis system for Java that uses learned Bayesian specifications for efficient synthesis.

arXiv paper on Bayou.

There are three main modules in Bayou:

  • driver: extracts sketches (in the DSL) and evidences from a Java program to generate the training data
  • model: implements the BED neural network (see paper), word embeddings, their training and inference procedures
  • synthesizer: performs combinatorial enumeration and concretizes a sketch sampled from the BED during inference into a Java program

Requirements

  • JDK 1.8
  • Python3 (Tested with 3.5.1)
  • Tensorflow (Tested with 1.2)
  • scikit-learn (Tested with 0.18.1)

Compiling and Running Bayou from Source on Ubuntu

1.) Download source from GitHub:

git clone https://github.com/capergroup/bayou.git

2.) Install Build Tools

cd bayou/tool_files/build_scripts
sudo ./install_deps.sh

3.) Compile Bayou

./build.sh

4.) Install Bayou Dependenices

cd out
chmod +x install_dependencies_apt.sh
sudo ./install_dependencies_apt.sh

Or, install_dependencies_mac.sh for Macintosh.

5.) Run Bayou

chmod +x start_bayou.sh synthesize.sh
./start_bayou.sh &

Wait until you see:

===================================
            Bayou Ready            
===================================

then execute:

./synthesize.sh

You should see output that ends with characters similar to:

/* --- End of application --- */


import edu.rice.bayou.annotations.Evidence;
import java.io.IOException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.FileNotFoundException;

public class TestIO1 {

    @Evidence(apicalls = {"readLine", "ready"})
    void __bayou_fill(String file) {
		String s1;
		String s;
		boolean b;
		BufferedReader br;
		FileReader fr;
		try {
			fr = new FileReader(file);
			br = new BufferedReader(fr);
			while (b = br.ready()) {
				s = br.readLine();
				s1 = br.readLine();
			}
			br.close();
		} catch (FileNotFoundException _e) {
		} catch (IOException _e) {
		}
	}

}
/* --- End of application --- */

Setup & Usage

Driver

cd /path/to/bayou/src/pl
ant

If you are working with the Android SDK,

export CLASSPATH=/path/to/android.jar

Bayou has an android.jar from Android 24 under the lib directory if needed.

After setup, run tests to ensure everything is fine:

cd scripts
python3 test_driver.py

Use the following command to run the driver on Program.java with the config file config.json:

java -jar /path/to/bayou/src/pl/out/artifacts/driver/driver.jar -f Program.java -c config.json [-o output.json]

Run driver with -h for details about the config file. The -o option can be used to output the sketch to a JSON file.

To create a single JSON file with the entire dataset, append the JSON files from each program and create a top level JSON entity called "programs" that has the entire list as the value. For example, if you have files Program1.json, ... Program10000.json, then the dataset should have the content:

{
  "programs": [
    <Program1.json>,
    <Program2.json>,
    ...
    <Program10000.json>
  ]
}

Model

First, set the evironment variable

cd /path/to/bayou/src/ml
export PYTHONPATH=`pwd`

To extract evidences from a data file DATA.json generated by the driver,

cd /path/to/bayou/src/ml/bayou/core
python3 utils.py DATA.json DATA-with-evidences.json [--max_seqs N] [--max_seq_length M]

This will create DATA-with-evidences.json with the evidences extracted from DATA.json. You can filter the programs from which evidences are extracted using the optional arguments. Run utils.py with -h for more information about these arguments.

To train LDA embeddings on evidences in the data file,

cd /path/to/bayou/src/ml/bayou/lda
python3 train.py --ntopics <N> --evidence <evidence_type> DATA-with-evidences.json --save save

where <evidence_type> is the type of evidence for which the embeddings are to be trained. As before, run train.py with -h for details about these arguments. The trained embeddings will be in the directory specified by --save (default save)

To train the BED neural network on the data file,

cd /path/to/bayou/src/ml/bayou/core
python3 train.py --config config.json DATA-with-evidences.json --save save

Run train.py with -h for details about the config file.

Note: The BED network will look for the pre-trained embeddings for each evidence type in the directory specified by --save (default save). The embeddings for each evidence must be in a directory named "embed_<evidence_type>" within the save directory. For instance, the embeddings for types should be in the directory save/embed_types, and the embeddings for apicalls should be in save/embed_apicalls. Copy the file(s) from where you saved the LDA models for each evidence type into these directories here.

Synthesizer

Suppose that the trained model is in a folder trained. Run the server to load the trained model into memory. The server will listen to a pipe (here bayoupipe) for inference queries:

mkdir server; cd server
python3 /path/to/bayou/scripts/server.py --save /path/to/trained --pipe bayoupipe

The synthesizer requires as input a Java class with:

  • a method named __bayou_fill that can be empty
  • arguments to this method that can be used for synthesis
  • evidences towards synthesis with the method annotation @Evidence

See examples in test/pl/synthesizer for more information about the input format.

Use the provided scripts/synthesize.sh for running the synthesizer. First, set the environment variables BAYOU_HOME and BAYOU_SERVER (and also bayoupipe if you used a different name for the pipe) in this script to the home folder of bayou and where you started the server, respectively. Then, to run the synthesizer on a file Program.java:

synthesize.sh Program.java

If all went well, the synthesizer should output a set of Java programs with the body of the method __bayou_fill synthesized according to the arguments and evidences provided.

Roadmap

  • Model: Encode natural language evidence (Javadoc) better
  • Synthesizer: Extract evidence from surrounding context instead of __bayou_fill
  • General: Gather more training data from a larger corpus