Cerebro: Static Subsuming Mutant Selection

This repo contains the code, data set and trained models for the paper Cerebro: Static Subsuming Mutant Selection, published in IEEE Transactions on Software Engineering (TSE).

The paper is available here:

The bib entry for citing the paper is available here:

The dataset is composed of the following:

Codebase gathered for the 48 GNU Coreutils [1] programs in C language and 10 projects in Java from Apache Commons Proper [2], Joda-Time [3], and Jsoup [4];
Mutant infomation in json file format for every program/project with Mutant ID, Source Code File Name, Mutation Type, and Line #;
Subsuming Mutant Label information in json file format with mapping to every mutant on ID basis for every program/project;
Abstracted Code for every original source code file and mutant for every program/project; and
Mutant Annotation Sequences in pairs of lhs (input) and rhs (expected output) for all mutants in every project/program, with mappings between Sequence File Indexes and Mutant IDs, and Sequences and Original Code File Indexes.

Tools/dependencies that we require before executing the code:

Apache Maven ( available here: https://maven.apache.org/download.cgi )
srcML ( available here: https://www.srcml.org/ )

NOTE: please do not forget to modify below variables in data.java file to specify your desired repository locations and/or dependencies

static String dirDataset = "D:/ag/github/Cerebro/dataset";

Commands to execute:

mvn clean package

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar [arguments]

options based on tasks:

to prepare dataset for model training:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep [language] [sequence-length] [abstraction-level]

where,

available options for [language] are c or java

[sequence-length] is the desired number of tokens in a sequence (numeric value) e.g. 25 / 50 / 100

available options for [abstraction-level] are full and partial

so, to create dataset for projects in java, of sequence length 100 with abstraction, below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep java 100 full

to create dataset for projects in c, of sequence length 50 with no abstraction (only code comments removed), below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep c 50 partial

to test the performance of model by evaluating the model generated sequences:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar test [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

to generate XMLs for input in simulation:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar combinetosimulate [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

Where to find trained models in the repo?

the trained models are available as below:

dataset/subsuming-mutant-prediction-[language]/smp/smp-[language]-[sequence-length]-[fold#]/model

e.g. model trained for java projects with abstracted sequences of length 100 is available below:

dataset/subsuming-mutant-prediction-java/smp/smp-java-100-01/model

Tools/dependencies that we require to train/test the models:

seq2seq ( available here: https://google.github.io/seq2seq/getting_started/#download-setup )
Tkinter (available here: https://docs.python.org/3.8/library/tkinter.html )
TensorFlow ( available here: https://www.tensorflow.org/install/pip )
PyYAML ( available here: https://pyyaml.org/wiki/LibYAML )
Perl (available here: https://www.cpan.org/modules/INSTALL.html )

for model training:

please refer to the script train.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

below is a sample usage for training a model till 10 epochs for projects in java with sequence length 50 having 135,903 training samples:

./train.sh ../smp-java-50-01 1359030 ../smp-java-50-01/model length_51-g-1-2 1 135903 135903 0

please refer to configurations available in directory Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/configs.

for sequence length 25, 50, and 100, please use length_26-g-1-2, length_51-g-1-2, and length_101-g-1-2

for model testing:

please refer to the script test.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

below is a sample usage for using the trained model available at location - (../smp-java-50-01/model) and test set available at location - (../smp-java-50-01/test) to generate sequences in file genrhs-smp-java-50-01.txt:

./test.sh ../smp-java-50-01/test ../smp-java-50-01/model genrhs-smp-java-50-01.txt

note:

please note that few models were larger than 100MB in size, hence they were split in 2 files to be able to check-in. below are those models:

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-01/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-02/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-03/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-04/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-05/model/model.ckpt.data-00000-of-00001

in aforementioned cases, model.ckpt.data-00000-of-00001 was divided in model.ckpt.data-00000-of-00001.001 and model.ckpt.data-00000-of-00001.002

References

[1] GNU Coreutils. https://www.gnu.org/software/coreutils/, (last accessed April 24, 2021).

[2] Apache Commons Proper. https://commons.apache.org, (last accessed April 24, 2021).

[3] Joda-Time. https://github.com/JodaOrg/joda-time/, (last accessed April 24, 2021).

[4] Jsoup. https://github.com/jhy/jsoup, (last accessed April 24, 2021).

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
code		code
dataset		dataset
figures		figures
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
cerebro.bib		cerebro.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebro: Static Subsuming Mutant Selection

References

About

License

garghub/Cerebro

Folders and files

Latest commit

History

Repository files navigation

Cerebro: Static Subsuming Mutant Selection

References

About

Topics

Resources

License

Stars

Watchers

Forks