Learning tractable Bayesian networks in the space of elimination orders

Abstract The computational complexity of inference is now one of the most relevant topics in the field of Bayesian networks. Although the literature contains approaches that learn Bayesian networks from high dimensional datasets, traditional methods do not bound the inference complexity of the learned models, often producing models where exact inference is intractable. This paper focuses on learning tractable Bayesian networks from data. To address this problem, we propose strategies for learning Bayesian networks in the space of elimination orders. In this manner, we can efficiently bound the inference complexity of the networks during the learning process. Searching in the combined space of directed acyclic graphs and elimination orders can be extremely computationally demanding. We demonstrate that one type of elimination trees, which we define as valid, can be used as an equivalence class of directed acyclic graphs and elimination orders, removing redundancy. We propose methods for incrementally compiling local changes made to directed acyclic graphs in elimination trees and for searching for elimination trees of low width. Using these methods, we can move through the space of valid elimination trees in polynomial time with respect to the number of network variables and in linear time with respect to treewidth. Experimental results show that our approach successfully bounds the inference complexity of the learned models, while it is competitive with other state-of-the-art methods in terms of fitting to data.

[1]  Jacques Carlier,et al.  Heuristic and metaheuristic methods for computing graph treewidth , 2004, RAIRO Oper. Res..

[2]  Concha Bielza,et al.  Learning Bayesian networks with low inference complexity , 2016, Progress in Artificial Intelligence.

[3]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[4]  Dan Geiger,et al.  A Practical Algorithm for Finding Optimal Triangulations , 1997, AAAI/IAAI.

[5]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  Stephen Gould,et al.  Learning Bounded Treewidth Bayesian Networks , 2008, NIPS.

[8]  Duc Truong Pham,et al.  Unsupervised training of Bayesian networks for data clustering , 2009, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[10]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[11]  Jens Lagergren,et al.  Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming , 2014, AISTATS.

[12]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[13]  Robert E. Tarjan,et al.  Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[14]  Guy Van den Broeck,et al.  Tractable Learning for Complex Probability Queries , 2015, NIPS.

[15]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[16]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[17]  Kevin Grant,et al.  Methods for constructing balanced elimination trees and other recursive decompositions , 2006, Int. J. Approx. Reason..

[18]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[19]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[20]  Guy Van den Broeck,et al.  Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference , 2014, AAAI.

[21]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[22]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[23]  Marco Zaffalon,et al.  Learning Bayesian Networks with Thousands of Variables , 2015, NIPS.

[24]  Johan Kwisthout,et al.  Most probable explanations in Bayesian networks: Complexity and tractability , 2011, Int. J. Approx. Reason..

[25]  Adnan Darwiche,et al.  Recursive conditioning , 2001, Artif. Intell..

[26]  Dimitrios M. Thilikos,et al.  On exact algorithms for treewidth , 2006, TALG.

[27]  Remco R. Bouckaert,et al.  Probalistic Network Construction Using the Minimum Description Length Principle , 1993, ECSQARU.

[28]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[29]  Barry W. Peyton,et al.  Maximum Cardinality Search for Computing Minimal Triangulations of Graphs , 2004, Algorithmica.

[30]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[31]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[32]  Concha Bielza,et al.  International Journal of Approximate Reasoning Tractability of most probable explanations in multidimensional Bayesian network classifiers ✩ , 2022 .

[33]  Pedro Larrañaga,et al.  Decomposing Bayesian networks: triangulation of the moral graph with genetic algorithms , 1997, Stat. Comput..

[34]  Qiang Ji,et al.  Advances in Learning Bayesian Networks of Bounded Treewidth , 2014, NIPS.

[35]  Pedro M. Domingos,et al.  Learning Arithmetic Circuits , 2008, UAI.

[36]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[37]  Janne H. Korhonen,et al.  Exact Learning of Bounded Tree-width Bayesian Networks , 2013, AISTATS.

[38]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[39]  Fedor V. Fomin,et al.  Exact (Exponential) Algorithms for Treewidth and Minimum Fill-In , 2004, ICALP.

[40]  Marco Zaffalon,et al.  Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables , 2016, NIPS.

[41]  Judea Pearl,et al.  A Constraint-Propagation Approach to Probabilistic Reasoning , 1985, UAI.

[42]  Hans L. Bodlaender A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC '93.

[43]  Concha Bielza,et al.  Multi-dimensional classification with Bayesian networks , 2011, Int. J. Approx. Reason..

[44]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[45]  Ross D. Shachter Evidence Absorption and Propagation through Evidence Reversals , 2013, UAI.

[46]  Arie M. C. A. Koster,et al.  Frequency assignment : models and algorithms , 1999 .

[47]  Marco Zaffalon,et al.  Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets , 2018, Int. J. Approx. Reason..

[48]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[49]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[50]  Fedor V. Fomin,et al.  Treewidth computation and extremal combinatorics , 2012 .

[51]  Jesse Davis,et al.  Learning Markov Network Structure with Decision Trees , 2010, 2010 IEEE International Conference on Data Mining.

[52]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[53]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[54]  Qiang Ji,et al.  Efficient learning of Bayesian networks with bounded tree-width , 2017, Int. J. Approx. Reason..

[55]  Judea Pearl,et al.  A Computational Model for Causal and Diagnostic Reasoning in Inference Systems , 1983, IJCAI.

[56]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[57]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[58]  Arie M. C. A. Koster,et al.  Treewidth computations I. Upper bounds , 2010, Inf. Comput..

[59]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[60]  Uffe Kjærulff Optimal decomposition of probabilistic networks by simulated annealing , 1992 .

[61]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[62]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[63]  Jesse Davis,et al.  Markov Network Structure Learning: A Randomized Feature Generation Approach , 2012, AAAI.

[64]  H. Akaike A new look at the statistical model identification , 1974 .

[65]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[66]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[67]  Brandon M. Malone,et al.  Learning Optimal Bounded Treewidth Bayesian Networks via Maximum Satisfiability , 2014, AISTATS.

[68]  Robert E. Tarjan,et al.  Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[69]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[70]  Linda C. van der Gaag,et al.  Multi-dimensional Bayesian Network Classifiers , 2006, Probabilistic Graphical Models.