Graphical models and what they reveal about GP when it solves a symbolic regression problem

We introduce the notion of using graphical models as a new and complementary means of understanding genetic programming dynamics (along with statistics such as mean tree size, etc). Graphical models reveal the dependency structure of the multivariate distribution associated with functions and terminals in solution structures. This information is more semantically rather than syntax oriented. As a first step, using the Pagie-2D problem as our exemplar, we present the generation and inter-generation dynamics of genetic programming in terms of graphical models that are largely unrestricted in structure. Open for discussion are questions such as: should a estimation of distribution genetic programming algorithm mimic standard genetic programming's search bias in terms of tree size and shape? And, does graphical model analysis indicate a better way to control the search bias for symbolic regression - by operator design, size control, bloat control or other means?