Feature Extraction from Optimization Data via DataModeler's Ensemble Symbolic Regression

We demonstrate a means of knowledge discovery through feature extraction that exploits the search history of an optimization run. We regress a symbolic model ensemble from optimization run search points and their objective scores. The frequency of a variable in the models of the ensemble indicates to what the extent it is an influential feature. Our demonstration uses a genetic programming symbolic regression software package that is designed to be "off-the-shelf". By default, the only parameter needed in order to evolve a suite of models is how long the user is willing to wait. Then the user can easily specify which models should go forward in terms of sufficient accuracy and complexity. For illustration purposes, we consider a common design heuristic in serial sensor sequencing: "place the most reliable sensor last". The heuristic is derived based on the mathematical form of the objective function that lays emphasis on the decision variable pertaining to the last sensor. Feature extraction on optimized sensor sequences indicates that the heuristic is usually effective though it is not always trustworthy. This is consistent with knowledge in sensor processing.

[1]  Riccardo Poli,et al.  Genetic and Evolutionary Computation , 2006, Intelligenza Artificiale.

[2]  Trent McConaghy,et al.  Genetic Programming Theory and Practice VII , 2009 .

[3]  Terence Soule,et al.  Genetic Programming Theory and Practice IV , 2007 .

[4]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[5]  Rick L. Riolo,et al.  Genetic Programming Theory and Practice XIX , 2008, Genetic and Evolutionary Computation.

[6]  E. Vladislavleva Model-based problem solving through symbolic regression via pareto genetic programming , 2008 .

[7]  Mark E. Kotanchek,et al.  SYMBOLIC REGRESSION VIA GP AS A DISCOVERY ENGINE : INSIGHTS ON OUTLIERS AND PROTOTYPES , 2009 .

[8]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[9]  Guido Smits,et al.  Ordinal Pareto Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[10]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[11]  Mark Kotanchek,et al.  Pursuing the Pareto Paradigm: Tournaments, Algorithm Variations and Ordinal Optimization , 2007 .

[12]  John R. Koza,et al.  Genetic Programming II , 1992 .

[13]  Maarten Keijzer,et al.  Scientific discovery using genetic programming , 2001 .

[14]  Kalyan Veeramachaneni,et al.  Swarm intelligence based optimization and control of decentralized serial sensor networks , 2008, 2008 IEEE Swarm Intelligence Symposium.

[15]  Arthur K. Kordon,et al.  Variable Selection in Industrial Datasets Using Pareto Genetic Programming , 2006 .

[16]  M. Athans,et al.  Distributed detection by a large team of sensors in tandem , 1992 .

[17]  Mark Kotanchek,et al.  Exploiting Trustable Models via Pareto GP for Targeted Data Collection , 2009 .

[18]  Una-May O'Reilly,et al.  Genetic Programming Theory and Practice II , 2005 .

[19]  Terence Soule,et al.  Genetic Programming Theory and Practice V , 2008 .