Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a prior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most k = 1 parent. For the general case (k > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

[1]  Phillip Capper The Proceedings , 2020, International Arbitration: A Handbook.

[2]  Bill Broyles Notes , 1907, The Classical Review.

[3]  B. D. Finetti La prévision : ses lois logiques, ses sources subjectives , 1937 .

[4]  E B Wilson,et al.  On Contingency Tables. , 1942, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Henry E. Kyburg,et al.  Studies in Subjective Probability , 1965 .

[8]  R. L. Winkler The Assessment of Prior Distributions in Bayesian Analysis , 1967 .

[9]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[10]  J. Aczél,et al.  Lectures on Functional Equations and Their Applications , 1968 .

[11]  Richard M. Karp,et al.  A simple derivation of Edmonds' algorithm for optimum branchings , 1971, Networks.

[12]  J. Darroch,et al.  A Characterization of the Dirichlet Distribution , 1971 .

[13]  D. M. Titterington,et al.  Updating a Diagnostic System using Unconfirmed Cases , 1976 .

[14]  S. E. Goodman,et al.  An algorithm for the longest cycle problem , 1976, Networks.

[15]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Francesco Maffioli,et al.  The k best spanning arborescences of a network , 1980, Networks.

[18]  Zvi Galil,et al.  Efficient Implementation of Graph Algorithms Using Contraction , 1984, FOCS.

[19]  Eric Horvitz,et al.  Reasoning about beliefs and actions under computational resource constraints , 1987, Int. J. Approx. Reason..

[20]  Ronald A. Howard Uncertainty about Probability: A Decision Analysis Perspective , 1988 .

[21]  Zvi Galil,et al.  Efficient implementation of graph algorithms using contraction , 1984, JACM.

[22]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[23]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[24]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[25]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[26]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[27]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[28]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[29]  Wray L. BuntineRIACS Theory Reenement on Bayesian Networks , 1991 .

[30]  James Evans,et al.  Optimization algorithms for networks and graphs , 1992 .

[31]  D E Heckerman,et al.  An evaluation of the diagnostic accuracy of Pathfinder. , 1992, Computers and biomedical research, an international journal.

[32]  Herbert A. Simon,et al.  Causality in Bayesian Belief Networks , 1993, UAI.

[33]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[34]  Richard E. Korf,et al.  Linear-Space Best-First Search , 1993, Artif. Intell..

[35]  Bruce Abramson,et al.  Deriving A Minimal itI-map of a Belief Network Relative to a Target Ordering of its Nodes , 1993, UAI.

[36]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[37]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[38]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[39]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[40]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[41]  Bruce D'Ambrosio,et al.  Local Expression Languages for Probabilistic Dependence: a Preliminary Report , 1994, UAI 1994.

[42]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[43]  Ross D. Shachter,et al.  A Decision-based View of Causality , 1994, UAI.

[44]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[45]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[46]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[47]  Ross D. Shachter,et al.  A Definition and Graphical Representation for Causality , 1995, UAI.

[48]  J. Pearl Causal diagrams for empirical research , 1995 .

[49]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[50]  Bruce D'Ambrosio,et al.  Local expression languages for probabilistic dependence , 1995, Int. J. Approx. Reason..

[51]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[52]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[53]  David Heckerman,et al.  A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks , 1995, UAI.

[54]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[55]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[56]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[57]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[58]  Steffen L. Lauritzen,et al.  Lectures on Contingency Tables , 2002 .

[59]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[60]  Hector J. Levesque,et al.  Knowledge Representation and Reasoning , 2004 .

[61]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[62]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[63]  D. Hinkley Annals of Statistics , 2006 .

[64]  K. Roeder,et al.  Journal of the American Statistical Association: Comment , 2006 .

[65]  Journal of Chemical Physics , 1932, Nature.