Optimal row and column ordering to improve table interpretation using estimation of distribution algorithms

A common information representation task in research as well as educational and statistical practice is to comprehensively and intuitively express data in two-dimensional tables. Examples include tables in scientific papers, as well as reports and the popular press.Data is often simple enough for users to reorder. In many other cases though, there are complex data patterns that make finding the best re-arrangement of rows and columns for optimum readability a tough problem.We propose that row and column ordering should be regarded as a combinatorial optimization problem and solved using evolutionary computation techniques. The use of genetic algorithms has already been proposed in the literature. This paper proposes for the first time the use of estimation of distribution algorithms for table ordering. We also propose alternative ways of representing the problem in order to reduce its dimensionality. By learning a selective naive Bayes classifier, we can find out how to jointly combine the parameters of these algorithms to get good table orderings. Experimental examples in this paper are on 2D tables.

[1]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[2]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[3]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[4]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[5]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[6]  Jacques Bertin,et al.  Graphics and graphic information-processing , 1981 .

[7]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[8]  H L KOCH,et al.  Harvey A. Carr, 1873-1954. , 1955, Psychological review.

[9]  C. Robert Kenley,et al.  Gaussian influence diagrams , 1989 .

[10]  Pedro Larrañaga,et al.  Optimization in Continuous Domains by Learning and Simulation of Gaussian Networks , 2000 .

[11]  Walter N. Durost,et al.  Statistical tables : their structure and use , 1937 .

[12]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[13]  M. Friendly Corrgrams , 2002 .

[14]  Isabelle Bloch,et al.  Inexact graph matching by means of estimation of distribution algorithms , 2002, Pattern Recognit..

[15]  Jun Feng,et al.  PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation , 2005, J. Chem. Inf. Model..

[16]  Andrew McDougall,et al.  Introduction to Statistical Consulting , 2002 .

[17]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[18]  P. Larra,et al.  Feature Subset Selection by Bayesian Networks Based Optimization Abstract|a New Method for Feature Subset Selection in Machine Learning, Fss-ebna , 1999 .

[19]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[20]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[21]  D R Griffin,et al.  Letters to the editor. , 1974, Science.

[22]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[23]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[24]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Stefan Niermann Optimizing the Ordering of Tables With Evolutionary Computation , 2005 .

[26]  Pedro Larrañaga,et al.  Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms , 2001, Int. J. Approx. Reason..

[27]  Concha Bielza,et al.  Multidimensional statistical analysis of the parameterization of a genetic algorithm for the optimal ordering of tables , 2010, Expert Syst. Appl..

[28]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[29]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[30]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[31]  Roberto Marcondes Cesar Junior,et al.  Inexact graph matching for model-based recognition: Evaluation and comparison of optimization algorithms , 2005, Pattern Recognit..

[32]  Rafael Martí,et al.  Variable neighborhood search for the linear ordering problem , 2006, Comput. Oper. Res..

[33]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[34]  Martin A. Koschat,et al.  A Case for Simple Tables , 2005 .

[35]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[36]  Pedro Larrañaga,et al.  Genetic Algorithms for the Travelling Salesman Problem: A Review of Representations and Operators , 1999, Artificial Intelligence Review.

[37]  Concha Bielza,et al.  Node deletion sequences in influence diagrams using genetic algorithms , 2004, Stat. Comput..