Bayou is a data-driven program synthesis system for Java that uses learned Bayesian specifications for efficient synthesis.
arXiv paper on Bayou.
There are three main modules in Bayou:
- driver: extracts sketches (in the DSL) and evidences from a Java program to generate the training data
- model: implements the BED neural network (see paper), word embeddings, their training and inference procedures
- synthesizer: performs combinatorial enumeration and concretizes a sketch sampled from the BED during inference into a Java program
- JDK 1.8
- Python3 (Tested with 3.5.1)
- Tensorflow (Tested with 1.2)
- scikit-learn (Tested with 0.18.1)
git clone https://github.com/capergroup/bayou.git
cd bayou/tool_files/build_scripts
sudo ./install_deps.sh
./build.sh
cd out
chmod +x install_dependencies_apt.sh
sudo ./install_dependencies_apt.sh
Or, install_dependencies_mac.sh for Macintosh.
chmod +x start_bayou.sh synthesize.sh
./start_bayou.sh &
Wait until you see:
===================================
Bayou Ready
===================================
then execute:
./synthesize.sh
You should see output that ends with characters similar to:
/* --- End of application --- */
import edu.rice.bayou.annotations.Evidence;
import java.io.IOException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
public class TestIO1 {
@Evidence(apicalls = {"readLine", "ready"})
void __bayou_fill(String file) {
String s1;
String s;
boolean b;
BufferedReader br;
FileReader fr;
try {
fr = new FileReader(file);
br = new BufferedReader(fr);
while (b = br.ready()) {
s = br.readLine();
s1 = br.readLine();
}
br.close();
} catch (FileNotFoundException _e) {
} catch (IOException _e) {
}
}
}
/* --- End of application --- */
cd /path/to/bayou/src/pl
ant
If you are working with the Android SDK,
export CLASSPATH=/path/to/android.jar
Bayou has an android.jar
from Android 24 under the lib directory if needed.
After setup, run tests to ensure everything is fine:
cd scripts
python3 test_driver.py
Use the following command to run the driver on Program.java
with the config file config.json
:
java -jar /path/to/bayou/src/pl/out/artifacts/driver/driver.jar -f Program.java -c config.json [-o output.json]
Run driver with -h for details about the config file. The -o option can be used to output the sketch to a JSON file.
To create a single JSON file with the entire dataset, append the JSON files from each program and create a top level JSON entity called "programs" that has the entire list as the value. For example, if you have files Program1.json
, ... Program10000.json
, then the dataset should have the content:
{
"programs": [
<Program1.json>,
<Program2.json>,
...
<Program10000.json>
]
}
First, set the evironment variable
cd /path/to/bayou/src/ml
export PYTHONPATH=`pwd`
To extract evidences from a data file DATA.json
generated by the driver,
cd /path/to/bayou/src/ml/bayou/core
python3 utils.py DATA.json DATA-with-evidences.json [--max_seqs N] [--max_seq_length M]
This will create DATA-with-evidences.json with the evidences extracted from DATA.json. You can filter the programs from which evidences are extracted using the optional arguments. Run utils.py with -h for more information about these arguments.
To train LDA embeddings on evidences in the data file,
cd /path/to/bayou/src/ml/bayou/lda
python3 train.py --ntopics <N> --evidence <evidence_type> DATA-with-evidences.json --save save
where <evidence_type> is the type of evidence for which the embeddings are to be trained. As before, run train.py with -h for details about these arguments. The trained embeddings will be in the directory specified by --save (default save
)
To train the BED neural network on the data file,
cd /path/to/bayou/src/ml/bayou/core
python3 train.py --config config.json DATA-with-evidences.json --save save
Run train.py with -h for details about the config file.
Note: The BED network will look for the pre-trained embeddings for each evidence type in the directory specified by --save (default save
). The embeddings for each evidence must be in a directory named "embed_<evidence_type>" within the save directory. For instance, the embeddings for types should be in the directory save/embed_types
, and the embeddings for apicalls should be in save/embed_apicalls
. Copy the file(s) from where you saved the LDA models for each evidence type into these directories here.
Suppose that the trained model is in a folder trained
. Run the server to load the trained model into memory. The server will listen to a pipe (here bayoupipe
) for inference queries:
mkdir server; cd server
python3 /path/to/bayou/scripts/server.py --save /path/to/trained --pipe bayoupipe
The synthesizer requires as input a Java class with:
- a method named
__bayou_fill
that can be empty - arguments to this method that can be used for synthesis
- evidences towards synthesis with the method annotation
@Evidence
See examples in test/pl/synthesizer
for more information about the input format.
Use the provided scripts/synthesize.sh
for running the synthesizer. First, set the environment variables BAYOU_HOME
and BAYOU_SERVER
(and also bayoupipe
if you used a different name for the pipe) in this script to the home folder of bayou and where you started the server, respectively. Then, to run the synthesizer on a file Program.java
:
synthesize.sh Program.java
If all went well, the synthesizer should output a set of Java programs with the body of the method __bayou_fill
synthesized according to the arguments and evidences provided.
- Model: Encode natural language evidence (Javadoc) better
- Synthesizer: Extract evidence from surrounding context instead of
__bayou_fill
- General: Gather more training data from a larger corpus