A Guide to the Literature on Learning Probabilistic Networks from Data

The literature review presented discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The article avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples.

[1]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[2]  E. Parzen Annals of Mathematical Statistics , 1962 .

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  R. L. Winkler The Quantification of Judgment: Some Methodological Suggestions , 1967 .

[5]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[6]  Ronald A. Howard,et al.  Decision analysis: Perspectives on inference, decision, and experimentation , 1970 .

[7]  R. Cox,et al.  Journal of the Royal Statistical Society B , 1972 .

[8]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[11]  D. A. Kenny,et al.  Correlation and Causation , 1937, Wilmott.

[12]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[15]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[16]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[17]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[18]  J. Rissanen Stochastic complexity and the mdl principle , 1987 .

[19]  D. Edwards,et al.  A fast model selection procedure for large families of models , 1987 .

[20]  Max Henrion,et al.  An Experimental Comparison of Knowledge Engineering for Expert Systems and for Decision Analysis , 1987, AAAI.

[21]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[22]  Ross D. Shachter,et al.  Thinking Backward for Knowledge Acquisition , 1987, AI Mag..

[23]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[24]  Donald Michie,et al.  Current developments in expert systems , 1987 .

[25]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[27]  Matthew Self,et al.  Bayesian Classification , 1988, AAAI.

[28]  Alice M. Agogino,et al.  Automated Construction of Sparse Bayesian Networks from Unstructured Probabilistic Models and Domain Information , 2013, UAI.

[29]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[30]  David J. Spiegelhalter,et al.  Assessment, Criticism and Improvement of Imprecise Subjective Probabilities for a Medical Expert System , 1989, UAI.

[31]  Kevin T. Kelly,et al.  Discovering Causal Structure. , 1989 .

[32]  J. Rissanen Stochastic Complexity in Statistical Inquiry Theory , 1989 .

[33]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[34]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[35]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[36]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[37]  D. Edwards Hierarchical interaction models , 1990 .

[38]  Max Henrion,et al.  Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis , 1990 .

[39]  P. Games Correlation and Causation: A Logical Snafu , 1990 .

[40]  Gregory F. Cooper,et al.  An Entropy-driven System for Construction of Probabilistic Expert Systems from Databases , 1990, UAI.

[41]  Dan Geiger,et al.  Identifying independence in bayesian networks , 1990, Networks.

[42]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[43]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[44]  N. Wermuth,et al.  On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models , 1990 .

[45]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[46]  Stuart L. Crawford,et al.  Constructor: A System for the Induction of Probabilistic Models , 1990, AAAI.

[47]  M. Frydenberg The chain graph Markov property , 1990 .

[48]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[49]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[50]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[51]  Eric Horvitz,et al.  Decision Analysis and Expert Systems , 1991, AI Mag..

[52]  Robin Hanson,et al.  Bayesian Classification with Correlation and Inheritance , 1991, IJCAI.

[53]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[54]  Wray L. Buntine Classifiers: A Theoretical and Empirical Study , 1991, IJCAI.

[55]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[56]  Wray L. BuntineRIACS Theory Reenement on Bayesian Networks , 1991 .

[57]  B Efron,et al.  Statistical Data Analysis in the Computer Age , 1991, Science.

[58]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[59]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[60]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[61]  D. J. Hand,et al.  Artificial Intelligence Frontiers in Statistics: AI and Statistics III , 1992 .

[62]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[63]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[64]  Dan Geiger,et al.  An Entropy-based Learning Algorithm of Bayesian Conditional Trees , 1992, UAI.

[65]  Steffen L. Lauritzen,et al.  aHUGIN: A System Creating Adaptive Causal Probabilistic Networks , 1992, UAI.

[66]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[67]  Judea Pearl,et al.  An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation , 1992, UAI.

[68]  Padhraic Smyth Admissible stochastic complexity models for classification problems , 1992 .

[69]  Peter Spirtes,et al.  Equivalence of causal models with latent variables , 1992 .

[70]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[71]  Bo ThiessonApril Bifrost { Block Recursive Models Induced from Relevant Knowledge, Observations, and Statistical Techniques , 1993 .

[72]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[73]  Francisco Javier Díez,et al.  Parameter adjustment in Bayes networks. The generalized noisy OR-gate , 1993, UAI.

[74]  Martin Abba Tanner,et al.  Tools for Statistical Inference: Observed Data and Data Augmentation Methods , 1993 .

[75]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[76]  D. Spiegelhalter,et al.  Modelling Complexity: Applications of Gibbs Sampling in Medicine , 1993 .

[77]  Stuart J. Russell,et al.  Decision Theoretic Subsampling for Induction on Large Databases , 1993, ICML.

[78]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[79]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[80]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[81]  Ron Musick,et al.  Minimal Assumption Distribution Propagation in Belief Networks , 1993, UAI.

[82]  Gregory M. Provan,et al.  Tradeoffs in Constructing and Evaluating Temporal Influence Diagrams , 1993, UAI.

[83]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[84]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[85]  Wray L. Buntine Artificial Intelligence Frontiers in Statistics , 1993 .

[86]  Kathryn B. Laskey Sensitivity analysis for probability assessments in Bayesian networks , 1995, IEEE Trans. Syst. Man Cybern..

[87]  J. Pearl [Bayesian Analysis in Expert Systems]: Comment: Graphical Models, Causality and Intervention , 1993 .

[88]  Shai Ben-David,et al.  On learning in the limit and non-uniform (ε,δ)-learning , 1993, COLT '93.

[89]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[90]  D. Hand,et al.  Artificial Intelligence Frontiers in Statistics , 2020 .

[91]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[92]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[93]  David J. Spiegelhalter,et al.  Sequential Model Criticism in Probabilistic Expert Systems , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  Michal Jacovi,et al.  On Learning in the Limit and Non-Uniform (epsilon, delta)-Learning. , 1993, COLT 1993.

[95]  S. Sclove Small-sample and large-sample statistical model selection criteria , 1994 .

[96]  P. Cheeseman,et al.  Selecting Models from Data: AI and Statistics IV , 1994 .

[97]  Rohan A. Baxter,et al.  MML and Bayesianism: similarities and differences: introduction to minimum encoding inference Part , 1994 .

[98]  Constantin F. Aliferis,et al.  An Evaluation of an Algorithm for Inductive Learning of Bayesian Belief Networks Using Simulated Data Sets , 1994, UAI.

[99]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[100]  Ross D. Shachter,et al.  Three Approaches to Probability Model Selection , 1994, UAI.

[101]  R. Scheines Inferring causal structure among unmeasured variables , 1994 .

[102]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[103]  Ross D. Shachter,et al.  Laplace's Method Approximations for Probabilistic Inference in Belief Networks with Continuous Variables , 1994, UAI.

[104]  D. Haussler,et al.  Rigorous Learning Curve Bounds from Statistical Mechanics , 1994, COLT '94.

[105]  Peter C. Cheeseman,et al.  Selecting models from data , 1994, Lecture notes in statistics.

[106]  David Madigan,et al.  Markov Chain Monte Carlo Methods for Hierarchical Bayesian Expert Systems , 1994 .

[107]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[108]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[109]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[110]  Remco R. Bouckaert,et al.  Properties of Bayesian Belief Network Learning Algorithms , 1994, UAI.

[111]  Russell G. Almond,et al.  Strategies for Graphical Model Selection , 1994 .

[112]  Peter Cheeseman,et al.  Selecting Models from Data: Artificial Intelligence and Statistics IV , 1994 .

[113]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[114]  Prakash P. Shenoy,et al.  Attitude Formation Models: Insights from TETRAD , 1994 .

[115]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[116]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[117]  R. T. Lie,et al.  Birth Defects Registered by Double Sampling: A Bayesian Approach Incorporating Covariates and Model Uncertainty , 1995 .

[118]  Bill Fulkerson,et al.  Machine Learning, Neural and Statistical Classification , 1995 .

[119]  Ross D. Shachter,et al.  A Definition and Graphical Representation for Causality , 1995, UAI.

[120]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[121]  Bo Thiesson,et al.  Accelerated Quantification of Bayesian Networks with Incomplete Data , 1995, KDD.

[122]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[123]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[124]  Adrian E. Raftery,et al.  Bayes factors and model uncertainty , 1995 .

[125]  Wray L. Buntine Chain graphs for learning , 1995, UAI.

[126]  Eric Horvitz,et al.  Uncertain reasoning and forecasting , 1995 .

[127]  E. H. Mamdani,et al.  Real-World Applications of Bayesian Networks - Introduction. , 1995 .

[128]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[129]  D. Madigan,et al.  Eliciting prior information to enhance the predictive performance of Bayesian graphical models , 1995 .

[130]  J. Pearl Causal diagrams for empirical research , 1995 .

[131]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[132]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[133]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[134]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[135]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[136]  Moninder Singh,et al.  Construction of Bayesian network structures from data: A brief survey and an efficient algorithm , 1995, Int. J. Approx. Reason..

[137]  David Heckerman,et al.  A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks , 1995, UAI.

[138]  Michael P. Wellman,et al.  Real-world applications of Bayesian networks , 1995, CACM.

[139]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[140]  D. Edwards Introduction to graphical modelling , 1995 .

[141]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[142]  Wray L. Buntine,et al.  Graphical models for discovering knowledge , 1996, KDD 1996.

[143]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[144]  Krzysztof J. Cios,et al.  Advances in neural information processing systems 7 , 1997 .

[145]  D. Madigan,et al.  On the Markov Equivalence of Chain Graphs, Undirected Graphs, and Acyclic Digraphs , 1997 .

[146]  Michael I. Jordan Graphical Models , 1998 .