Oracle-guided component-based program synthesis - Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, Ashish Tiwari
In this paper the authors propose a novel technique for synthesizing loop-free programs. Two potential usage of such a synthesis can be in generating efficient bit manipulating programs and for deobfuscating malware programs. Formally the synthesis problem considered in the paper requires
- A validation oracle V, that, given any candidate program can tell whether the program is the desired one or not. (For the examples in this paper the user herself is the validation oracle)
- An I/O oracle I, that, given any program input, returns the output of the desired program on that input. (Again the user herself is the I/O oracle for the examples in this paper)
- A set of specifications called a library of basic components f, such that each component includes,
o A tuple of input variables and a output variable
o An expression over the input and out variables such that it specifics the input-output relationship of the component
Now the goal of the synthesis problem is to synthesize a program P that can be validated by the oracle V, i.e. V (P) = true. Further, the program P should be contrasted using only the base components in the library.
The authors propose an encoding of the program using a set of integer-valued location variables L, such that the value of the variable determine which components goes where (location) and from which location it takes its input. Next they define two constraints
- I/O behavioural constraint which ensures that the generated program has the same input-output behaviour as that specified by the I/O oracle.
- Distinguishing constraint which generates an input that differentiates this program from another candidate program
Now you can synthesis the candidate program by using SMT solvers to solve the behavioural constraint and then using SMT solvers again to solve the distinguishing constraint that will generate an input (new) which is added to the set of input-output pairs and this continues unless we get a semantically unique program (shown my failure of SMT solver to solve distinguishing constraint). Check that program with the validation oracle to see if it valid.
The evaluation sections shows the applicability of this approach to synthesize minimal bit manipulating programs and the authors also discuss optimizing this technique by using an application dependent bias in the sampling of input space. In the discussion the authors explore connections of this approach with fundamental results in computational learning theory. They say that on the basis of the teaching dimension results by Goldman and Kerns it can be shown that only a few examples would be needed to synthesis these programs as they form a low teaching dimension.
Possible weakness of the approach is limitation to synthesis of simple (minimal) loop-free programs.