Incremental Learning of Latent Forests

In the analysis of real-world data, it is useful to learn a latent variable model that represents the data generation process. In this setting, latent tree models are useful because they are able to capture complex relationships while being easily interpretable. In this paper, we propose two incremental algorithms for learning forests of latent trees. Unlike current methods, the proposed algorithms are based on the variational Bayesian framework, which allows them to introduce uncertainty into the learning process and work with mixed data. The first algorithm, incremental learner, determines the forest structure and the cardinality of its latent variables in an iterative search process. The second algorithm, constrained incremental learner, modifies the previous method by considering only a subset of the most prominent structures in each step of the search. Although restricting each iteration to a fixed number of candidate models limits the search space, we demonstrate that the second algorithm returns almost identical results for a small fraction of the computational cost. We compare our algorithms with existing methods by conducting a comparative study using both discrete and continuous real-world data. In addition, we demonstrate the effectiveness of the proposed algorithms by applying them to data from the 2018 Spanish Living Conditions Survey. All code, data, and results are available at https://github.com/ferjorosa/incremental-latent-forests.

[1]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[3]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[4]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[5]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[6]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[7]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[8]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[9]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[10]  Matthew J. Beal,et al.  Variational Bayesian learning of directed graphical models with hidden variables , 2006 .

[11]  Jose Miguel Puerta,et al.  Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood , 2010, Data Mining and Knowledge Discovery.

[12]  Kristian Kersting,et al.  Mixed Sum-Product Networks: A Deep Architecture for Hybrid Domains , 2018, AAAI.

[13]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[14]  Jeffrey S. Racine,et al.  Nonparametric estimation of distributions with categorical and continuous data , 2003 .

[15]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[16]  Tiago M. Fragoso,et al.  Bayesian Model Averaging: A Systematic Review and Conceptual Classification , 2015, 1509.08864.

[17]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[18]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[19]  Tao Chen,et al.  Latent Tree Models and Approximate Inference in Bayesian Networks , 2008, AAAI.

[20]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[21]  Anima Anandkumar,et al.  Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models , 2015, ArXiv.

[22]  Tao Chen,et al.  LTC: A latent tree approach to classification , 2013, Int. J. Approx. Reason..

[23]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Gabriele Soffritti,et al.  Model-based methods to identify multiple cluster structures in a data set , 2007, Comput. Stat. Data Anal..

[26]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[27]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[28]  Farhan Khawar,et al.  Latent tree models for hierarchical topic detection , 2016, Artif. Intell..

[29]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[30]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[31]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[32]  Tengfei Liu,et al.  Greedy learning of latent tree models for multidimensional clustering , 2013, Machine Learning.

[33]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Tengfei Liu,et al.  Model-based clustering of high-dimensional data: Variable selection versus facet determination , 2013, Int. J. Approx. Reason..

[35]  Tengfei Liu,et al.  A Survey on Latent Tree Models and Applications , 2013, J. Artif. Intell. Res..

[36]  Jimeng Sun,et al.  Guaranteed Scalable Learning of Latent Tree Models , 2014, UAI.

[37]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[38]  David Barber,et al.  Tractable Variational Structures for Approximating Graphical Models , 1998, NIPS.

[39]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[40]  Thomas D. Nielsen,et al.  Latent classification models for binary data , 2009, Pattern Recognit..

[41]  Christopher K. I. Williams,et al.  Greedy Learning of Binary Latent Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[43]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[44]  Neil D. Lawrence,et al.  Approximating Posterior Distributions in Belief Networks Using Mixtures , 1997, NIPS.

[45]  Xiaopeng Li,et al.  Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering , 2018, ICLR.

[46]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[47]  Philippe Leray,et al.  A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies , 2011, BMC Bioinformatics.

[48]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[49]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[50]  Piotr Zwiernik Latent Tree Models , 2017, Handbook of Graphical Models.

[51]  Piotr Zwiernik,et al.  Marginal likelihood and model selection for Gaussian latent tree and forest models , 2014, 1412.8285.

[52]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[53]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.