Heuristic Approaches to Program Synthesis: Genetic Programming and Beyond

Lecturer: Krzysztof Krawiec (Poznań University of Technology).

About the lecturer | Course Summary | Bibliography | Slides | Assignment

About the lecturer: Dr. hab. Krzysztof Krawiec is an Associate Professor in the Laboratory of Intelligent Decision Support Systems at the Faculty of Computer Science of Poznań University of Technology, Poznań, Poland. His primary research area is computational intelligence, in particular genetic programming, evolutionary computation, and machine learning, with applications in program synthesis, modeling, image analysis, and games. Dr. Krawiec co-chaired the European Conference on Genetic Programming (EuroGP) in 2013 and 2014, is an associate editor of Genetic Programming and Evolvable Machines journal, and the president of Polish Chapter of IEEE Computational Intelligence Society. His recent work on program synthesis conducted at Computer Science and Artificial Intelligence Laboratory of MIT won him the best paper award at Genetic and Evolutionary Computation GECCO'14. Previously, he was also developing evolutionary image analysis systems at the Center for Research on Intelligence Systems of University of California.

Course summary: The course will revolve around heuristic program synthesis. In program synthesis, the task is to automatically generate a computer program that meets a given set of requirements, provided either as a set of tests (examples), or constraints imposed on program input and output (a.k.a. contracts). We will primarily focus on the generate-and-test approaches to program synthesis, of which the most common is nowadays genetic programming (GP), a bio-inspired methodology of program induction based on the metaheuristic of evolutionary computation.

The course will start with posing the program synthesis task, and outlining the main paradigms of solving this class of problems. The major challenges and factors that contribute to the difficulty of program synthesis will be identified and discussed. We will also establish the relationships of program synthesis to related classes of problems in combinatorial optimization and so-called interactive domains.

Before presenting GP, a short '101' introduction to the field of evolutionary computation (EC) will be necessary. The fundamentals will be illustrated by examples, followed by a selection of recent representative threads within EC research, in particular the estimation of distribution algorithms.

After the EC fundamentals, we will present the main concepts of GP. The major research paradigms of GP will be reviewed, in particular those related to various program representations (functional expression trees, instruction sequences, etc.). We will characterize the relationship between program synthesis tasks and test-based problems, and discuss whether GP can be considered a machine learning technique.

The central part of this course will concern the selected recent developments in GP-based program synthesis, namely semantic and behavioral genetic programming. The common motivation for this research direction can be summarized as follows. In heuristic program synthesis, one usually assumes that the objective function (a.k.a. fitness in EC) should be the only driver to guide the search process. However, such an objective function typically conveys very little information compared to the size of a search space. With such a limited feedback, a search algorithm may struggle to solve the problem. Yet in quite many domains, more information on candidate solutions is available. In GP, one can easily examine program behavior on particular tests or at intermediate execution states. We will demonstrate how to use that information to provide additional search guidance, and discuss the potential impact of these developments for the domain of program synthesis and beyond.

The concepts presented in the course will be illustrated with 'success stories' concerning applications of GP in various contexts, like detection and repair of bugs in commercial software and improvement of non-functional properties of code. We will also present several applications in unconstrained modeling, including discovery of laws from experimental data and prediction of global temperature changes. The growing area of search-based software engineering will be covered as well.

Bibliography: A comprehensible introduction to GP is available in this freely available book:

The course will involve hands-on demonstrations of evolving programs with GP, most probably using the open-source software ECJ framework, available here.

Slides: The lecture slides can be found here.

Assignment: You can find the problem set here. Please send the solutions to the address provided on the problem sheet by February 15.