Efficiently Approximating Markov Tree Bagging for High-Dimensional Density Estimation

We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems defined over many variables and when few observations are available, those mixtures generally outperform a single Markov tree maximizing the data likelihood, but are far more expensive to compute. In this paper, we describe new algorithms for approximating such models, with the aim of speeding up learning without sacrificing accuracy. More specifically, we propose to use a filtering step obtained as a by-product from computing a first Markov tree, so as to avoid considering poor candidate edges in the subsequently generated trees. We compare these algorithms (on synthetic data sets) to Mixtures of Bagged Markov Trees, as well as to a single Markov tree derived by the classical Chow-Liu algorithm and to a recently proposed randomized scheme used for building tree mixtures.

[1]  Johan Kwisthout,et al.  The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks , 2010, ECAI.

[2]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[3]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[4]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[5]  Stephen Gould,et al.  Learning Bounded Treewidth Bayesian Networks , 2008, NIPS.

[6]  Louis Wehenkel,et al.  Sub-quadratic markov tree mixture models for probability density estimation , 2010 .

[7]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[8]  Bernard Chazelle,et al.  A minimum spanning tree algorithm with inverse-Ackermann type complexity , 2000, JACM.

[9]  Louis Wehenkel,et al.  Probability Density Estimation by Perturbing and Combining Tree Structured Markov Networks , 2009, ECSQARU.

[10]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[11]  Louis Wehenkel,et al.  On the Construction of the Inclusion Boundary Neighbourhood for Markov Equivalence Classes of Bayesian Network Structures , 2002, UAI.

[12]  L. Wehenkel,et al.  Sub-quadratic Markov tree mixture learning based on randomizations of the Chow-Liu algorithm , 2010 .

[13]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[14]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[15]  Louis Wehenkel,et al.  Decision tree pruning using an additive information quality measure , 1993 .

[16]  Gal Elidan,et al.  Bagged Structure Learning of Bayesian Network , 2011, AISTATS.

[17]  M. Birkner,et al.  Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach , 2002 .

[18]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[19]  Michael Clarke,et al.  Symbolic and Quantitative Approaches to Reasoning and Uncertainty , 1991, Lecture Notes in Computer Science.

[20]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[21]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[22]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[23]  Louis Wehenkel,et al.  Learning Inclusion-Optimal Chordal Graphs , 2008, UAI.

[24]  Larry Wasserman,et al.  Forest Density Estimation , 2010, J. Mach. Learn. Res..

[25]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[26]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Daphne Koller,et al.  Learning a Small Mixture of Trees , 2009, NIPS.

[29]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[30]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[31]  Louis Wehenkel,et al.  Towards sub-quadratic learning of probability density models in the form of mixtures of trees , 2010, ESANN.

[32]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[33]  Padhraic Smyth,et al.  Infinite mixtures of trees , 2007, ICML '07.

[34]  L. Breiman Arcing Classifiers , 1998 .

[35]  Tommi S. Jaakkola,et al.  Tractable Bayesian learning of tree belief networks , 2000, Stat. Comput..