Learning Multivariate Distributions by Competitive Assembly of Marginals

We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or “primitives,” which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.

[1]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[2]  A. Levine,et al.  P53 is a tumor suppressor gene , 2004, Cell.

[3]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[4]  Jeff A. Bilmes,et al.  PAC-learning Bounded Tree-width Graphical Models , 2004, UAI.

[5]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[6]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[7]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[8]  M. Olivier,et al.  Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database , 2007, Human mutation.

[9]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[10]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[11]  David Poole,et al.  MULTIPLY SECTIONED BAYESIAN NETWORKS AND JUNCTION FORESTS FOR LARGE KNOWLEDGE‐BASED SYSTEMS , 1993, Comput. Intell..

[12]  A. Fersht,et al.  Quantitative analysis of residual folding and DNA binding in mutant p53 core domain: definition of mutant states for rescue in cancer therapy , 2000, Oncogene.

[13]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[15]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[16]  Joachim Utans Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data , 1993, NIPS.

[17]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[18]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[19]  A. Fersht,et al.  Structure–function–rescue: the diverse nature of common p53 cancer mutants , 2007, Oncogene.

[20]  Ting Wang,et al.  A global suppressor motif for p53 cancer mutants. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  R. A. Zemlin,et al.  Integer Programming Formulation of Traveling Salesman Problems , 1960, JACM.

[22]  R. Clarke,et al.  Approaches to working in high-dimensional data spaces: gene expression microarrays , 2008, British Journal of Cancer.

[23]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[24]  Tamás Szántai,et al.  Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution , 2012, Ann. Oper. Res..

[25]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[26]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[27]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[28]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[29]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[30]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[31]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[32]  A. Levine,et al.  Surfing the p53 network , 2000, Nature.

[33]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[35]  Stephen Gould,et al.  Learning Bounded Treewidth Bayesian Networks , 2008, NIPS.

[36]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[37]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[38]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[39]  Peter A. Jones,et al.  The Role of DNA Methylation in Mammalian Epigenetics , 2001, Science.

[40]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[41]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[42]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[43]  Bernd Neumann,et al.  Context-Based Probabilistic Scene Interpretation , 2010, IFIP AI.

[44]  David T. Brown,et al.  A Note on Approximations to Discrete Probability Distributions , 1959, Inf. Control..

[45]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[46]  Tao Jiang,et al.  OligoSpawn: a software tool for the design of overgo probes from large unigene datasets , 2006, BMC Bioinformatics.

[47]  C. Harris,et al.  Mutations in the p53 tumor suppressor gene: clues to cancer etiology and molecular pathogenesis. , 1994, Cancer research.

[48]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[49]  Qiang Ji,et al.  Structure learning of Bayesian networks using constraints , 2009, ICML '09.

[50]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[51]  D. Pe’er Bayesian Network Analysis of Signaling Networks: A Primer , 2005, Science's STKE.

[52]  Peter A. Flach,et al.  Hierarchical Bayesian Networks: An Approach to Classification and Learning for Structured Data , 2004, SETN.

[53]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[54]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[55]  Marina Meila,et al.  An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data , 1999, ICML.

[56]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.