Learning Bayesian networks: The combination of knowledge and statistical data

We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

[1]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[2]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[3]  Bruce Abramson,et al.  Deriving A Minimal itI-map of a Belief Network Relative to a Target Ordering of its Nodes , 1993, UAI.

[4]  Ronald A. Howard Uncertainty about Probability: A Decision Analysis Perspective , 1988 .

[5]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[6]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Harold N. Gabow,et al.  Two Algorithms for Generating Weighted Spanning Trees in Order , 1977, SIAM J. Comput..

[12]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[13]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[14]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[15]  Francesco Maffioli,et al.  The k best spanning arborescences of a network , 1980, Networks.

[16]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[17]  Ross D. Shachter,et al.  A Definition and Graphical Representation for Causality , 1995, UAI.

[18]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[19]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[20]  Eric Horvitz,et al.  Reasoning about beliefs and actions under computational resource constraints , 1987, Int. J. Approx. Reason..

[21]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[22]  David Heckerman,et al.  A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks , 1995, UAI.

[23]  Herbert A. Simon,et al.  Causality in Bayesian Belief Networks , 1993, UAI.

[24]  J. Aczél,et al.  Lectures on Functional Equations and Their Applications , 1968 .

[25]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[26]  D E Heckerman,et al.  An evaluation of the diagnostic accuracy of Pathfinder. , 1992, Computers and biomedical research, an international journal.

[27]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[28]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[29]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[30]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[31]  D. M. Titterington,et al.  Updating a Diagnostic System using Unconfirmed Cases , 1976 .

[32]  Richard M. Karp,et al.  A simple derivation of Edmonds' algorithm for optimum branchings , 1971, Networks.

[33]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[34]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[35]  Richard E. Korf,et al.  Linear-Space Best-First Search , 1993, Artif. Intell..

[36]  E B Wilson,et al.  On Contingency Tables. , 1942, Proceedings of the National Academy of Sciences of the United States of America.

[37]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[38]  Zvi Galil,et al.  Efficient Implementation of Graph Algorithms Using Contraction , 1984, FOCS.

[39]  Henry E. Kyburg,et al.  Studies in Subjective Probability , 1965 .

[40]  James Evans,et al.  Optimization algorithms for networks and graphs , 1992 .

[41]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[42]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[43]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[44]  R. L. Winkler The Assessment of Prior Distributions in Bayesian Analysis , 1967 .