A Library for White-Box Testing of Introductory Programming Algorithms
Witter is a software testing library that allows programming instructors to define white-box tests for Java source code. Witter analyzes the execution of a method against a reference solution, to verify that the code not only produces correct results but is also in accordance with a desired algorithm behaviour.
Installation • Specifying Reference Solutions • Testing Arbitrary Solutions • Examples
Witter is an experimental library, and as such is not yet available in build automation tools (Gradle, etc.)
To use Witter in your project, first build its .jar file using Gradle's build task. The .jar file is generated under
the project root in /build/libs
. This file should be copied to your own project's libs
folder,
and then added as a dependency in your build automation tool of choice. For example, in Gradle:
dependencies {
implementation(files("libs/witter-0.5.6.jar"))
}
Note, of course, that the file name can change when updates for Witter are released, and should be changed in your dependency specification accordingly.
You may additionally need to specify dependencies for the Strudel and ANTLR libraries.
One can define the tester cases for a given exercise by writing a reference solution in a Java method, annotated with a header comment that defines the different tester inputs and the metrics that should be measured during tester execution. The content of the comments has to obey Witter’s Test Specification Language (TSL), whose syntax is similar to Java’s annotation syntax:
/*
@Test(new int[] { 1, 2, 3, 4, 5 })
@Test(new int[] { 2, 4, 6 })
@CountLoopIterations
@CountArrayReads
@CheckSideEffects
*/
public static int sum(int[] a) {
...
}
Test cases can be configured to use any number of white-box metrics either throughout
the test or within a bounded scope (using
directive). As in the initial version of
Witter, evaluation metrics can be optionally instantiated with a margin parameter that
specifies an acceptable deviation interval from the reference value.
An object can be created using the new
directive by passing the name of the class to
instantiate followed by a list of arguments to one of the class constructors. References to
the created objects can be stored using ref
. Class methods can be invoked by using the
call
directive on a previously declared reference. A sequence of these directives defines a
stateful test case.
The call
directive is used by specifying the name of the method to be invoked and a list
of arguments. We may use the "dot notation" to perform calls on instance methods given its
reference (ref.call(...)
), as opposed to calling an instance method by passing its reference as the first argument (call("methodName", ref, ...)
).
For every call, the return values of the evaluated method are
compared to the reference solution, allowing for regular black-box testing. Additionally, if
the optional expected argument is passed, Witter will assert that both the reference
solution and the solution under evaluation produce the expected result.
The following example illustrates Witter's DSL with a test suite for list data structures, containing two test cases.
val test = TestSuite(referencePath = "path/to/reference/solution/ArrayList.java") {
Case("testContains") {
// Create new object and store a reference to it
val list = ref { new("ArrayList", 3) }
// Executed without white-box metrics (black-box only)
list.call("size", expected = 0)
list.call("add", "hello")
list.call("size", expected = 1)
list.call("add", "world")
list.call("size", expected = 2)
using(CountLoopIterations() + CountArrayReadAccesses()) {
// These calls compare loop iterations
call("add", list, "algorithm")
call("size", list, expected = 3)
}
}
// All the calls within this case compare loop iterations
Case(CountLoopIterations(), "testIsEmpty") {
val list = ref { new("ArrayList", 3) }
list.call("isEmpty", expected = true)
list.call("add", "hello")
list.call("add", "icpec")
list.call("isEmpty", expected = false)
}
}
As Witter is designed for third-party integration, it provides a form of executing the tests programmatically. Tests are executed providing an annotated reference solution or a test suite created using the DSL as described, and a solution that one wishes to assess.
Using an annotated reference solution, one can execute:
val tester: Test = Test("ReferenceSolution.java")
val results: List<TestResult> = tester.execute("Solution.java")
For a test suite created using the DSL, the process is similar:
val test = TestSuite(referencePath = "path/to/reference/Solution.java")
val results = test.apply(subjectPath = "path/to/submitted/Solution.java")
The results consist of a list of feedback items for each aspect defined in the tester specification, holding the following information:
- a flag indicating success or failure;
- which kind of metric has failed (recall Table 1);
- the location of code elements involved in the failed tests (e.g., procedure, parameters, loop structures);
- a human-readable descriptive feedback message.
Witter currently supports the following runtime metrics.
Metric | TSL Annotation (DSL is identical, without the @) |
Verification |
---|---|---|
Return values | @Test([...args]) | Return value is equal to reference solution. Multiple annotations can be used. |
Side effects | @CheckSideEffects | Side effects on arguments (presence and absence) are the same to those of the reference solution. |
Loop iterations | @CountLoopIterations([threshold]) | Total number of loop iterations matches the one of the reference solution. |
Array allocations | @CheckArrayAllocations | The array allocations match those of the reference solution (component types and lengths). |
Array reads | @CountArrayReads([threshold]) | The number of array read accesses is the same as in the reference solution. |
Array writes | @CountArrayWrites([threshold]) | The number of array write accesses is the same as in the reference solution. |
Object allocations | @CheckObjectAllocations | The number of object allocations and their types match those of the reference solution. |
Recursive calls | @CountRecursiveCalls([threshold]) | The number of recursive calls matches the one of the reference solution. |
Factorial (Recursive)
Reference solution with recursion:
/*
@Test(5)
@CountRecursiveCalls(1)
*/
static int factorial(int n) {
if (n == 0) return 1;
else return n * factorial (n - 1);
}
Solution under testing (iterative, with a defect):
static int factorial(int n) {
int f = 1;
for (int i = 0; i <= n; i++)
f *= i; // i starts at 0, f always 0
return f;
}
Witter tester results (black-box and white-box fail):
[fail] factorial(5)
Expected result: 120
Found: 0
[fail] factorial(5)
Expected recursive calls: 4 (± 1)
Found: 0
Binary Search (Iterative)
Reference solution using binary search:
/*
@Test(new int[] { 1, 2, 3, 4, 5, 6, 7 }, 1)
@Test(new int[] { 1, 3, 7, 9, 11, 13, 17, 19 }, 18)
@CountLoopIterations
@CheckSideEffects
*/
static int binarySearch (int[] a, int e) {
int l = 0;
int r = a. length - 1;
while (l <= r) {
int m = l + (r - l) / 2;
if (a[m] == e) return m;
if (a[m] < e) l = m + 1;
else r = m - 1;
}
return -1;
}
Solution under testing (performing linear search):
static int binarySearch (int[] a, int e) {
for (int i = 0; i < a. length ; i++)
if (a[i] == e) return i;
return -1;
}
Witter tester results (black-box pass, white-box fail):
[pass] search([1, 2, 3, 4, 5, 6, 7], 1)
Expected result: 0
[fail] search([1, 2, 3, 4, 5, 6, 7], 1)
Expected loop iterations: 3
Found: 1
[pass] search([1, 2, 3, 4, 5, 6, 7], 1)
Expected side effects of a: [1, 2, 3, 4, 5, 6, 7]
[pass] search([1, 2, 3, 4, 5, 6, 7], 1)
Expected side effects of e: 1
[pass] search([1, 3, 7, 9, 11, 13, 17, 19], 18)
Expected result: -1
[fail] search([1, 3, 7, 9, 11, 13, 17, 19], 18)
Expected loop iterations: 4
Found: 8
[pass] search([1, 3, 7, 9, 11, 13, 17, 19], 18)
Expected side effects of a: [1, 3, 7, 9, 11, 13, 17, 19]
[pass] search([1, 3, 7, 9, 11, 13, 17, 19], 18)
Expected side effects of e: 18
Insertion Sort (Procedure)
Reference solution performing insertion sort:
/*
@Test(new int[] { 5, 4, 3, 2, 1 })
@CountArrayReads
@CountArrayWrites
@CheckSideEffects
*/
static void sort(int[] a) {
for (int i = 1; i < a. length; i++) {
for (int j = i; j > 0; j--) {
if (a[j] >= a[j - 1]) break;
int tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
}
}
Solution under testing (performing selection sort):
static void sort(int[] a) {
for (int i = 0; i < a. length - 1; i++) {
int min = i;
for (int j = i + 1; j < a. length ; j++)
if (a[j] < a[min]) min = j;
int tmp = a[i];
a[i] = a[min];
a[min] = tmp;
}
}
Witter tester results (black-box pass, white-box fail):
[fail] sort([5, 4, 3, 2, 1])
Expected array reads: 40
Found: 28
[fail] sort([5, 4, 3, 2, 1])
Expected array writes: 20
Found: 8
[pass] sort([5, 4, 3, 2, 1])
Expected side effects of a: [5, 4, 3, 2, 1]
Array Average (Procedure)
Reference solution (Average.java):
static double average(double[] a) {
double sum = 0.0;
for (int i = 0; i < a.length; i++) sum += a[i];
return sum / a.length;
}
DSL Test Suite:
val tests = TestSuite("path/to/reference/Average.java") {
Case(CountLoopIterations()) {
call("average", listOf(1,2,3,4,5), expected = 3.0)
call("average", listOf(0,2,3,5,7), expected = 3.4)
}
}
Solution under testing with a defect - starts at index 1 (Solution.java):
static double average(double[] a) {
double sum = 0.0;
for (int i = 1; i < a.length; i++) sum += a[i];
return sum / a.length;
}
Applying the Test Suite to the solution under evaluation:
val results: List<ITestResult> = tests.apply(
subjectPath = "path/to/Solution.java"
)
results.forEach { println("$it\n") }
Test results:
[fail] average([1, 2, 3, 4, 5])
Expected: 3.0
Found: 2.8
[fail] average([1, 2, 3, 4, 5])
Expected loop iterations: 5
Found: 4
[pass] average([0, 2, 3, 5, 7])
Expected: 3.4
[fail] average([0, 2, 3, 5, 7])
Expected loop iterations: 5
Found: 4
Stack (Data Structure)
Reference solution's size method (StackReference.java):
public int size() {
return this.size; // Auxiliary integer attribute
}
DSL Test Suite:
val tests = TestSuite("path/to/reference/StackReference.java") {
Case {
val stack = ref { new("Stack", 5) }
call("push", stack, 1)
call("push", stack, 2)
call("push", stack, 3)
using (CountLoopIterations() + CountArrayReadAccesses()) {
call("size", stack, expected = 3)
}
}
}
Solution under testing with a defect - counts non-zero items individually (StackSolution.java):
public int size() {
int s = 0;
for (int i = 0; i < stack.length; i++)
if (stack[i] != 0)
s += 1;
return s;
}
Test results:
[pass] size(Stack(stack=[1, 2, 3, 0, 0], size=3))
Expected: 3
[fail] size(Stack(stack=[1, 2, 3, 0, 0], size=3))
Expected loop iterations: 0
Found: 5
[fail] size(Stack(stack=[1, 2, 3, 0, 0], size=3))
Expected array reads: 0
Found: 5
The following is an example of how Witter could be integrated into an existing development system, using a simple GUI custom-made for example purposes.
If you use or reference Witter in your academic work, you should cite the relevant following publication(s).
Witter: A Library for White-Box Testing of Introductory Programming Algorithms
ACM Reference Format
Afonso B. Caniço and André L. Santos. 2023. Witter: A Library for White-Box Testing of Introductory Programming Algorithms. In Proceedings of the 2023 ACM SIGPLAN International Symposium on SPLASH-E (SPLASH-E ’23), October 25, 2023, Cascais, Portugal. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3622780.3623650
BibTeX
@inproceedings{canicosantos2023,
author = {Cani\c{c}o, Afonso and Santos, Andr\'{e}},
title = {Witter: A Library for White-Box Testing of Introductory Programming Algorithms},
year = {2023},
isbn = {9798400703904},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3622780.3623650},
doi = {10.1145/3622780.3623650},
abstract = {Software testing is mostly performed in a black-box manner, that is, without incorporating any knowledge of the internal workings of programs into the tests. This practice usually suffices for enterprises and general practitioners, where the focus lies on producing reliable results while most algorithmic tasks are provided by third-party libraries. However, for computer science students and the like, it might not be straightforward to discern the underlying causes of an incorrect tester result or to understand why certain algorithmic goals are not met. We present Witter, a software testing library that allows programming educators to define white-box tests for Java source code. Our tests analyze the execution of a method against a reference solution, to verify that the code not only produces correct results but is also in accordance with a desired algorithm behavior.},
booktitle = {Proceedings of the 2023 ACM SIGPLAN International Symposium on SPLASH-E},
pages = {69–74},
numpages = {6},
keywords = {programming education, white-box testing, feedback, assessment},
location = {Cascais, Portugal},
series = {SPLASH-E 2023}
}
A Domain-Specific Language for Dynamic White-Box Evaluation of Java Assignments
Dagstuhl Reference Format
Afonso B. Caniço and André L. Santos. A Domain-Specific Language for Dynamic White-Box Evaluation of Java Assignments. In 5th International Computer Programming Education Conference (ICPEC 2024). Open Access Series in Informatics (OASIcs), Volume 122, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/OASIcs.ICPEC.2024.2
BibTeX
@InProceedings{b.canico_et_al:OASIcs.ICPEC.2024.2,
author = {B. Cani\c{c}o, Afonso and Santos, Andr\'{e} L.},
title = {{A Domain-Specific Language for Dynamic White-Box Evaluation of Java Assignments}},
booktitle = {5th International Computer Programming Education Conference (ICPEC 2024)},
pages = {2:1--2:13},
series = {Open Access Series in Informatics (OASIcs)},
ISBN = {978-3-95977-347-8},
ISSN = {2190-6807},
year = {2024},
volume = {122},
editor = {Santos, Andr\'{e} L. and Pinto-Albuquerque, Maria},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.ICPEC.2024.2},
URN = {urn:nbn:de:0030-drops-209715},
doi = {10.4230/OASIcs.ICPEC.2024.2},
annote = {Keywords: White-box assessment, student assessment, programming education}
}
White-Box Assessment for Programming Education
This is a dissertation produced for the completion of the Master's of Science (MSc) in Computer Science and Engineering by the author Afonso B. Caniço.
ACM Reference Format
Afonso Manuel Barral Caniço. 2024. White-Box Assessment for Programming Education. Master's Thesis. Iscte - Instituto Universitário de Lisboa, Avenida das Forças Armadas, 1649-026 Lisboa. http://hdl.handle.net/10071/32252.
BibTeX
@mastersthesis{canico2024,
author = {Cani\c{c}o, Afonso Manuel Barral},
title = {White-Box Assessment for Programming Education},
school = {Iscte - Instituto Universit\'{a}rio de Lisboa},
year = {2024},
type = {Master's Thesis},
address = {Avenida das For\c{c}as Armadas, 1649-026, Lisboa},
month = {July},
note = {\url{http://hdl.handle.net/10071/32252}}
}
If you have any questions regarding Witter, its development process, or the related academic publications, feel free to contact the authors:
- Afonso B. Caniço - ambco@iscte-iul.pt (Principal Author)
- André L. Santos - andre.santos@iscte-iul.pt (Advisor)
The Witter library is authored, developed, and currently maintained by Afonso B. Caniço. (That's me! :D)
Credit for the Strudel library, which is used within Witter as the runtime environment for code execution, goes to André L. Santos, its author and main contributor. Contributions to Java translation and minor contributions to some aspects of the Strudel library were made by Afonso B. Caniço.