Unsupervised Learning Of Bayesian Networks Via Estimation Of Distribution Algorithms: An Application To Gene Expression Data Clustering

This paper proposes using estimation of distribution algorithms for unsupervised learning of Bayesian networks, directly as well as within the framework of the Bayesian structural EM algorithm. Both approaches are empirically evaluated in synthetic and real data. Specifically, the evaluation in real data consists in the application of this paper's proposals to gene expression data clustering, i.e., the identification of clusters of genes with similar expression profiles across samples, for the leukemia database. The validation of the clusters of genes that are identified suggests that these may be biologically meaningful.

[1]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[2]  Nir Friedman,et al.  Class discovery in gene expression data , 2001, RECOMB.

[3]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[4]  Pedro Larrañaga,et al.  Learning Bayesian networks in the space of structures by estimation of distribution algorithms , 2003, Int. J. Intell. Syst..

[5]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[6]  Heinz Mühlenbein,et al.  The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[7]  Martin Beibel Selection of Informative Genes in Gene Expression Based Diagnosis: A Nonparametric Approach , 2000, ISMDA.

[8]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[9]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[10]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[11]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[12]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[13]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[14]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Ivan Bratko,et al.  On Estimating Probabilities in Tree Pruning , 1991, EWSL.

[17]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[18]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[22]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[25]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[26]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[27]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[28]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[29]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[30]  Pedro Larrañaga,et al.  Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction , 2002, Machine Learning.

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[32]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[33]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[34]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[35]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.