Efficient approximation of probability distributions with k-order decomposable models

During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k. Unfortunately, the problem of learning a maximum likelihood decomposable model given a maximum clique size is NP-hard for k 2 . In this work, we propose the fractal tree family of algorithms which approximates this problem with a computational complexity of O ( k 2 ? n 2 ? N ) in the worst case, where n is the number of implied random variables and N is the size of the training set.The fractal tree algorithms construct a sequence of maximal i-order decomposable graphs, for i = 2 , ? , k , in k - 1 steps. At each step, the algorithms follow a divide-and-conquer strategy that decomposes the problem into a set of separate problems. Each separate problem is efficiently solved using the generalized Chow-Liu algorithm. Fractal trees can be considered a natural extension of the Chow-Liu algorithm, from k = 2 to arbitrary values of k, and they have shown a competitive behavior to deal with the maximum likelihood problem. Due to their competitive behavior, their low computational complexity and their modularity, which allow them to implement different parallelization strategies, the proposed procedures are especially advisable for modeling high dimensional domains. Fractal trees learn decomposable models with a bounded clique size.Fractal trees have a computational complexity of O ( k 2 ? n 2 ? N ) , in the worst case.We propose a prune-and-graft operator with the same computational complexity.All the provided procedures are modular and can be easily parallelized.The proposed algorithms have obtained competitive experimental results.

[1]  Tamás Szántai,et al.  Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution , 2012, Ann. Oper. Res..

[2]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[3]  Peter P. Chen,et al.  Empirical Comparison of Greedy Strategies for Learning Markov Networks of Treewidth k , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[4]  Francesco M. Malvestuto,et al.  A backward selection procedure for approximating a discrete probability distribution by decomposable models , 2012, Kybernetika.

[5]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[6]  Francesco M. Malvestuto,et al.  Approximating discrete probability distributions with decomposable models , 1991, IEEE Trans. Syst. Man Cybern..

[7]  José Antonio Lozano,et al.  A general framework for the statistical analysis of the sources of variance for classification error estimators , 2013, Pattern Recognit..

[8]  Junshan Zhang,et al.  Modeling social network relationships via t-cherry junction trees , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[9]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[10]  Mikko Koivisto,et al.  Learning Chordal Markov Networks by Dynamic Programming , 2014, NIPS.

[11]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[12]  Terry J. Wagner,et al.  Consistency of an estimate of tree-dependent probability distributions (Corresp.) , 1973, IEEE Trans. Inf. Theory.

[13]  Tamás Szántai,et al.  Discovering a junction tree behind a Markov network by a greedy algorithm , 2011, ArXiv.

[14]  Anders L. Madsen,et al.  A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs , 2014, Probabilistic Graphical Models.

[15]  Jukka Corander,et al.  Learning Chordal Markov Networks by Constraint Satisfaction , 2013, NIPS.

[16]  Michael I. Jordan,et al.  Efficient Stepwise Selection in Decomposable Models , 2001, UAI.

[17]  T. Speed,et al.  Decomposable graphs and hypergraphs , 1984, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[20]  Nathan Srebro,et al.  Maximum likelihood Markov networks : an algorithmic approach , 2000 .

[21]  Tamás Szántai,et al.  On the Approximation of a Discrete Multivariate Probability Distribution Using the New Concept of t -Cherry Junction Tree , 2010 .

[22]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[23]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .