Tractable learning of Bayesian networks from partially observed data

Abstract The majority of real-world problems require addressing incomplete data. The use of the structural expectation-maximization algorithm is the most common approach toward learning Bayesian networks from incomplete datasets. However, its main limitation is its demanding computational cost, caused mainly by the need to make an inference at each iteration of the algorithm. In this paper, we propose a new method with the purpose of guaranteeing the efficiency of the learning process while improving the performance of the structural expectation-maximization algorithm. We address the first objective by applying an upper bound to the treewidth of the models to limit the complexity of the inference. To achieve this, we use an efficient heuristic to search the space of the elimination orders. For the second objective, we study the advantages of directly computing the score with respect to the observed data rather than an expectation of the score, and provide a strategy to efficiently perform these computations in the proposed method. We perform exhaustive experiments on synthetic and real-world datasets of varied dimensionalities, including datasets with thousands of variables and hundreds of thousands of instances. The experimental results support our claims empirically.

[1]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[2]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[3]  Robert E. Tarjan,et al.  Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[4]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5]  Qiang Ji,et al.  Efficient learning of Bayesian networks with bounded tree-width , 2017, Int. J. Approx. Reason..

[6]  H. Akaike A new look at the statistical model identification , 1974 .

[7]  Iñaki Inza,et al.  Learning Bayesian network classifiers from label proportions , 2013, Pattern Recognit..

[8]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[9]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[10]  Jun Wang,et al.  Enhancing multi-label classification by modeling dependencies among labels , 2014, Pattern Recognit..

[11]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[12]  Jesse Davis,et al.  Learning Markov Network Structure with Decision Trees , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Robert E. Tarjan,et al.  Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[14]  Kevin Grant,et al.  Methods for constructing balanced elimination trees and other recursive decompositions , 2006, Int. J. Approx. Reason..

[15]  Johan Kwisthout,et al.  The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks , 2010, ECAI.

[16]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[17]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[18]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[20]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[21]  Jose M. Peña,et al.  Uni- and Multi-Dimensional Clustering Via Bayesian Networks , 2016 .

[22]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[23]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[24]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[25]  Adnan Darwiche,et al.  Recursive conditioning , 2001, Artif. Intell..

[26]  Bon K. Sy,et al.  Reasoning MPE to Multiply Connected Belief Networks Using Message Passing , 1992, AAAI.

[27]  Tengfei Liu,et al.  Model-based clustering of high-dimensional data: Variable selection versus facet determination , 2013, Int. J. Approx. Reason..

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Judea Pearl,et al.  A Constraint-Propagation Approach to Probabilistic Reasoning , 1985, UAI.

[30]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32]  Marco Zaffalon,et al.  Bayesian network data imputation with application to survival tree analysis , 2016, Comput. Stat. Data Anal..

[33]  Guy Van den Broeck,et al.  Tractable Learning for Complex Probability Queries , 2015, NIPS.

[34]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[35]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36]  Arie M. C. A. Koster,et al.  Treewidth computations I. Upper bounds , 2010, Inf. Comput..

[37]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[38]  Uffe Kjærulff Optimal decomposition of probabilistic networks by simulated annealing , 1992 .

[39]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[40]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[41]  Shu Wang,et al.  Towards complex activity recognition using a Bayesian network-based probabilistic generative framework , 2017, Pattern Recognit..

[42]  María Concepción Bielza Lozoya,et al.  Hybrid Gaussian and von Mises model-based clustering , 2016, ECAI 2016.

[43]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[44]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[45]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[46]  Stephen Gould,et al.  Learning Bounded Treewidth Bayesian Networks , 2008, NIPS.

[47]  Johan Kwisthout,et al.  Most probable explanations in Bayesian networks: Complexity and tractability , 2011, Int. J. Approx. Reason..

[48]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[49]  Dimitrios M. Thilikos,et al.  On exact algorithms for treewidth , 2006, TALG.

[50]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[51]  Yonghong Tian,et al.  Robust multiple cameras pedestrian detection with multi-view Bayesian network , 2015, Pattern Recognit..

[52]  Ross D. Shachter Evidence Absorption and Propagation through Evidence Reversals , 2013, UAI.

[53]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[54]  Qiang Ji,et al.  Learning Bayesian network parameters under incomplete data with domain knowledge , 2009, Pattern Recognit..

[55]  Pedro Larrañaga,et al.  Decomposing Bayesian networks: triangulation of the moral graph with genetic algorithms , 1997, Stat. Comput..

[56]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[57]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[58]  Qiang Ji,et al.  Measuring the intensity of spontaneous facial action units with dynamic Bayesian network , 2015, Pattern Recognit..

[59]  Jose Miguel Puerta,et al.  Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood , 2010, Data Mining and Knowledge Discovery.

[60]  Uue Kjjrull Triangulation of Graphs { Algorithms Giving Small Total State Space Triangulation of Graphs { Algorithms Giving Small Total State Space , 1990 .

[61]  James D. Park,et al.  MAP Complexity Results and Approximation Methods , 2002, UAI.

[62]  B. Mohar,et al.  Graph Minors , 2009 .

[63]  Duc Truong Pham,et al.  Unsupervised training of Bayesian networks for data clustering , 2009, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[64]  Fedor V. Fomin,et al.  Treewidth computation and extremal combinatorics , 2008, Comb..

[65]  Marco Zaffalon,et al.  Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets , 2018, Int. J. Approx. Reason..

[66]  Jesse Davis,et al.  Markov Network Structure Learning: A Randomized Feature Generation Approach , 2012, AAAI.

[67]  Concha Bielza,et al.  Learning tractable Bayesian networks in the space of elimination orders , 2019, Artif. Intell..

[68]  Guoliang Xue,et al.  Applying two-level simulated annealing on Bayesian structure learning to infer genetic networks , 2004 .