论文信息 - Tractable learning of Bayesian networks from partially observed data

Tractable learning of Bayesian networks from partially observed data

Abstract The majority of real-world problems require addressing incomplete data. The use of the structural expectation-maximization algorithm is the most common approach toward learning Bayesian networks from incomplete datasets. However, its main limitation is its demanding computational cost, caused mainly by the need to make an inference at each iteration of the algorithm. In this paper, we propose a new method with the purpose of guaranteeing the efficiency of the learning process while improving the performance of the structural expectation-maximization algorithm. We address the first objective by applying an upper bound to the treewidth of the models to limit the complexity of the inference. To achieve this, we use an efficient heuristic to search the space of the elimination orders. For the second objective, we study the advantages of directly computing the score with respect to the observed data rather than an expectation of the score, and provide a strategy to efficiently perform these computations in the proposed method. We perform exhaustive experiments on synthetic and real-world datasets of varied dimensionalities, including datasets with thousands of variables and hundreds of thousands of instances. The experimental results support our claims empirically.

[1] Derek G. Corneil,et al. Complexity of finding embeddings in a k -tree , 1987 .

[2] Nevin Lianwen Zhang,et al. Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[3] Robert E. Tarjan,et al. Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[4] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[5] Qiang Ji,et al. Efficient learning of Bayesian networks with bounded tree-width , 2017, Int. J. Approx. Reason..

[6] H. Akaike. A new look at the statistical model identification , 1974 .

[7] Iñaki Inza,et al. Learning Bayesian network classifiers from label proportions , 2013, Pattern Recognit..

[8] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[9] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[10] Jun Wang,et al. Enhancing multi-label classification by modeling dependencies among labels , 2014, Pattern Recognit..

[11] Constantin F. Aliferis,et al. The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[12] Jesse Davis,et al. Learning Markov Network Structure with Decision Trees , 2010, 2010 IEEE International Conference on Data Mining.

[13] Robert E. Tarjan,et al. Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[14] Kevin Grant,et al. Methods for constructing balanced elimination trees and other recursive decompositions , 2006, Int. J. Approx. Reason..

[15] Johan Kwisthout,et al. The Necessity of Bounded Treewidth for Efficient Inference in Bayesian Networks , 2010, ECAI.

[16] Concha Bielza,et al. Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[17] Pedro Larrañaga,et al. Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[18] Nevin L. Zhang,et al. A simple approach to Bayesian network computations , 1994 .

[19] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[20] S. Lauritzen. The EM algorithm for graphical association models with missing data , 1995 .

[21] Jose M. Peña,et al. Uni- and Multi-Dimensional Clustering Via Bayesian Networks , 2016 .

[22] Paul D. Seymour,et al. Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[23] Gregory F. Cooper,et al. A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[24] Prakash P. Shenoy,et al. Axioms for probability and belief-function proagation , 1990, UAI.

[25] Adnan Darwiche,et al. Recursive conditioning , 2001, Artif. Intell..

[26] Bon K. Sy,et al. Reasoning MPE to Multiply Connected Belief Networks Using Message Passing , 1992, AAAI.

[27] Tengfei Liu,et al. Model-based clustering of high-dimensional data: Variable selection versus facet determination , 2013, Int. J. Approx. Reason..

[28] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[29] Judea Pearl,et al. A Constraint-Propagation Approach to Probabilistic Reasoning , 1985, UAI.

[30] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[31] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32] Marco Zaffalon,et al. Bayesian network data imputation with application to survival tree analysis , 2016, Comput. Stat. Data Anal..

[33] Guy Van den Broeck,et al. Tractable Learning for Complex Probability Queries , 2015, NIPS.

[34] Pedro Larrañaga,et al. An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[35] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36] Arie M. C. A. Koster,et al. Treewidth computations I. Upper bounds , 2010, Inf. Comput..

[37] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[38] Uffe Kjærulff. Optimal decomposition of probabilistic networks by simulated annealing , 1992 .

[39] Nir Friedman,et al. The Bayesian Structural EM Algorithm , 1998, UAI.

[40] H. Markowitz. The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[41] Shu Wang,et al. Towards complex activity recognition using a Bayesian network-based probabilistic generative framework , 2017, Pattern Recognit..

[42] María Concepción Bielza Lozoya,et al. Hybrid Gaussian and von Mises model-based clustering , 2016, ECAI 2016.