Probabilistic graphical models in artificial intelligence

In this paper, we review the role of probabilistic graphical models in artificial intelligence. We start by giving an account of the early years when there was important controversy about the suitability of probability for intelligent systems. We then discuss the main milestones for the foundations of graphical models starting with Pearl's pioneering work. Some of the main techniques for problem solving (abduction, classification, and decision-making) are briefly explained. Finally, we propose some important challenges for future research and highlight relevant applications (forensic reasoning, genomics and the use of graphical models as a general optimization tool).

[1]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Slimane Hammoudi,et al.  Enterprise Information Systems V , 2004 .

[4]  Prakash P. Shenoy,et al.  Local Computation in Hypertrees , 1991 .

[5]  Fabio Gagliardi Cozman,et al.  Credal networks , 2000, Artif. Intell..

[6]  D. Nilsson,et al.  An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems , 1998, Stat. Comput..

[7]  Raymond Reiter,et al.  A Logic for Default Reasoning , 1987, Artif. Intell..

[8]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[9]  Finn Verner Jensen,et al.  Unconstrained Influence Diagrams , 2002, UAI.

[10]  Fabio Gagliardi Cozman,et al.  Inference in Credal Networks with Branch-and-Bound Algorithms , 2003, ISIPTA.

[11]  P. Walley Inferences from Multinomial Data: Learning About a Bag of Marbles , 1996 .

[12]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Michael I. Jordan,et al.  Estimating Dependency Structure as a Hidden Variable , 1997, NIPS.

[15]  Gregory F. Cooper,et al.  An Entropy-driven System for Construction of Probabilistic Expert Systems from Databases , 1990, UAI.

[16]  Gregory F. Cooper,et al.  A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[17]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[18]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[19]  Vasant Honavar,et al.  Evolutionary Synthesis of Bayesian Networks for Optimization , 2001 .

[20]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[21]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[22]  José A. Gámez,et al.  Partial Abductive Inference in Bayesian Networks By Using Probability Trees , 2003, ICEIS.

[23]  Duncan Fyfe Gillies,et al.  Objective Probabilities in Expert Systems , 1993, Artif. Intell..

[24]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Paola Sebastiani,et al.  Robust Bayes classifiers , 2001, Artif. Intell..

[26]  Pedro Larrañaga,et al.  Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction , 2002, Machine Learning.

[27]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[28]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[29]  Stuart J. Russell,et al.  Probabilistic graphical models and algorithms for genomic analysis , 2004 .

[30]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[31]  Didier Dubois,et al.  Possibility theory , 2018, Scholarpedia.

[32]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[33]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[34]  P. Sebastiani,et al.  Bayesian Networks for Genomic Analysis , 2004 .

[35]  Steffen L. Lauritzen,et al.  Stable local computation with conditional Gaussian distributions , 2001, Stat. Comput..

[36]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[37]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[38]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[39]  Robijn Bruinsma,et al.  Soft order in physical systems , 1994 .

[40]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[41]  Philippe Smets,et al.  The Normative Representation of Quantified Beliefs by Belief Functions , 1997, Artif. Intell..

[42]  Marco Zaffalon,et al.  Statistical inference of the naive credal classifier , 2001, ISIPTA.

[43]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[44]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[45]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[46]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[47]  Glenn Shafer,et al.  The art of causal conjecture , 1996 .

[48]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[49]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[50]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[51]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[52]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[53]  Concha Bielza,et al.  Influence Diagrams for Neonatal Jaundice Management , 1999, AIMDM.

[54]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[55]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[56]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[57]  Dw Van Boxel,et al.  Probabilistic Expert Systems for Forensic Inference from Genetic Markers , 2002 .

[58]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[59]  Steen Andreassen,et al.  MUNIN - A Causal Probabilistic Network for Interpretation of Electromyographic Findings , 1987, IJCAI.

[60]  Dirk Thierens,et al.  Linkage Information Processing In Distribution Estimation Algorithms , 1999, GECCO.

[61]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[62]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[63]  Pedro Larrañaga,et al.  Machine Learning : Editorial , 2005 .

[64]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[65]  Kristian G. Olesen,et al.  HUGIN - A Shell for Building Bayesian Belief Universes for Expert Systems , 1989, IJCAI.

[66]  Henry Tirri,et al.  On predictive distributions and Bayesian networks , 2000, Stat. Comput..

[67]  Pedro Larrañaga,et al.  Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches , 1998, Artif. Intell. Medicine.

[68]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[69]  Peter J. F. Lucas,et al.  Restricted Bayesian Network Structure Learning , 2002, Probabilistic Graphical Models.

[70]  Solomon Eyal Shimony Explanation, Irrelevance, and Statistical Independence , 1991, AAAI.

[71]  D. Schum The Evidential Foundations of Probabilistic Reasoning , 1994 .

[72]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[73]  D. V. Lindley [Scoring Rules and the Inevitability of Probability]: Reply to Discussion , 1982 .

[74]  Jayant Kalagnanam,et al.  A comparison of decision alaysis and expert rules for sequential diagnosis , 2013, UAI.

[75]  Gregory F. Cooper,et al.  Exact model averaging with naive Bayesian classifiers , 2002, ICML.

[76]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[77]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[78]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[79]  Jesús Cerquides,et al.  Tractable Bayesian Learning of Tree Augmented Naive Bayes Classifiers , 2003 .

[80]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[81]  Heinz Mühlenbein,et al.  A Factorized Distribution Algorithm Using Single Connected Bayesian Networks , 2000, PPSN.

[82]  P. Thagard Why wasn't O.J. convicted? Emotional coherence in legal inference , 2003, Cognition & emotion.

[83]  Josep Roure Alcobé Incremental Learning of Tree Augmented Naive Bayes Classifiers , 2002, IBERAMIA.

[84]  H. E. Pople,et al.  Internist-1, an experimental computer-based diagnostic consultant for general internal medicine. , 1982, The New England journal of medicine.

[85]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[86]  John S. Breese,et al.  Interval Influence Diagrams , 1989, UAI.

[87]  Ramón López de Mántaras,et al.  Tractable Bayesian Learning of Tree Augmented Naive Bayes Models , 2003, ICML.

[88]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[89]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[90]  Adnan Darwiche,et al.  Solving MAP Exactly using Systematic Search , 2002, UAI.

[91]  S. Moral,et al.  On the problem of performing exact partial abductive inference in Bayesian belief networks using junction trees , 2002 .

[92]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[93]  Edward H. Shortliffe,et al.  A model of inexact reasoning in medicine , 1990 .

[94]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[95]  G. Ernest Wright Explanation , 1944, The Biblical Archaeologist.

[96]  João Gama,et al.  Adaptive Bayes , 2002, IBERAMIA.

[97]  Jon Doyle,et al.  A Truth Maintenance System , 1979, Artif. Intell..

[98]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[99]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[100]  Marcel Boumans,et al.  J. Pearl, Causality, Models, Reasoning, and Inference , 2005 .

[101]  Søren Holbech Nielsen,et al.  Proceedings of the Second European Workshop on Probabilistic Graphical Models , 2004 .

[102]  A. Dawid,et al.  Probabilistic expert systems for DNA mixture profiling. , 2003, Theoretical population biology.

[103]  María S. Pérez-Hernández,et al.  Interval Estimation Naïve Bayes , 2003, IDA.

[104]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[105]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[106]  Eric Horvitz,et al.  The Inconsistent Use of Measures of Certainty in Artificial Intelligence Research , 1985, UAI.

[107]  Thomas D. Nielsen,et al.  Representing and Solving Asymmetric Bayesian Decision Problems , 2000, UAI.

[108]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[109]  Franco Taroni,et al.  How the probability of a false positive affects the value of DNA evidence. , 2003, Journal of forensic sciences.

[110]  John McCarthy,et al.  A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 , 2006, AI Mag..

[111]  A. Philip Dawid,et al.  An object-oriented Bayesian network for estimating mutation rates , 2003, AISTATS.

[112]  Serafín Moral,et al.  Algorithms for Approximate Probability Propagation in Bayesian Networks , 2004 .

[113]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[114]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[115]  David Heckerman,et al.  Decision-theoretic troubleshooting , 1995, CACM.

[116]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[117]  J. Ross Quinlan,et al.  Inferno: A Cautious Approach To Uncertain Inference , 1986, Comput. J..

[118]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[119]  Franz Pernkopf,et al.  Floating search algorithm for structure learning of Bayesian network classifiers , 2003, Pattern Recognit. Lett..

[120]  Gregory M. Provan,et al.  Why is diagnosis using belief networks insensitive to imprecision in probabilities? , 1996, UAI.

[121]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[122]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[123]  Charles Elkan,et al.  The paradoxical success of fuzzy logic , 1993, IEEE Expert.

[124]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[125]  Martin Pelikan,et al.  Hierarchical Bayesian optimization algorithm: toward a new generation of evolutionary algorithms , 2010, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[126]  Andrés Cano,et al.  A forward-backward Monte Carlo method for solving influence diagrams , 2006, Int. J. Approx. Reason..

[127]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[128]  Michael C. Horsch,et al.  An Anytime Algorithm for Decision Making under Uncertainty , 1998, UAI.

[129]  Steffen L. Lauritzen,et al.  Evaluating Influence Diagrams using LIMIDs , 2000, UAI.

[130]  Víctor Robles,et al.  Interval Estimation Na¨ ive Bayes , 2003 .

[132]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[133]  Kathryn B. Laskey,et al.  Computational Inference for Evidential Reasoning in Support of Judicial Proof , 2002 .

[134]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[135]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[136]  David J. Balding,et al.  Bayesian Networks and Probabilistic Inference in Forensic Science , 2011 .

[137]  Lotfi A. Zadeh,et al.  A Theory of Approximate Reasoning , 1979 .

[138]  Illtyd Trethowan Causality , 1938 .

[139]  J. Charnes,et al.  A Online Appendix To Accompany “ Multi-stage Monte Carlo Method for Solving Influence Diagrams Using Local Computation , 2003 .

[140]  Ali Jenzarli,et al.  Solving Influence Diagrams Using Gibbs Sampling , 1995, AISTATS.

[141]  Uffe Kjaerulff,et al.  A computational scheme for Reasoning in Dynamic Probabilistic Networks , 2013, 1303.5407.

[142]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[143]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[144]  David E. Goldberg,et al.  Hierarchical Bayesian Optimization Algorithm , 2006, Scalable Optimization via Probabilistic Modeling.

[145]  A. P. Dawid,et al.  Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[146]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[147]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[148]  Judea Pearl,et al.  The Logic of Representing Dependencies by Directed Graphs , 1987, AAAI.

[149]  Richard Scheines,et al.  TETRAD II: Tools for Discovery , 1994 .

[150]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[151]  H. E. Pople,et al.  Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine , 1982 .

[152]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[153]  Concha Bielza,et al.  A Comparison of Graphical Techniques for Asymmetric Decision Problems , 1999 .

[154]  Pedro Larrañaga,et al.  Towards a New Evolutionary Computation - Advances in the Estimation of Distribution Algorithms , 2006, Towards a New Evolutionary Computation.

[155]  Mark A. Peot,et al.  Geometric Implications of the Naive Bayes Assumption , 1996, UAI.

[156]  Dirk Thierens,et al.  Multi-objective optimization with diversity preserving mixture-based iterated density estimation evolutionary algorithms , 2002, Int. J. Approx. Reason..

[157]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[158]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[159]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[160]  Paola Sebastiani,et al.  Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia , 2005, Nature Genetics.

[161]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[162]  Hans-Paul Schwefel,et al.  Parallel Problem Solving from Nature — PPSN IV , 1996, Lecture Notes in Computer Science.

[163]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[164]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[165]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[166]  J. Kacprzyk,et al.  Technologies for constructing intelligent systems: Tasks , 2002 .

[167]  Enrico Fagiuoli,et al.  Tree-augmented naive credal classifiers , 2000 .

[168]  Scott M. Olmsted On representing and solving decision problems , 1983 .

[169]  Edward H. Shortliffe,et al.  Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence) , 1984 .

[170]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[171]  Prakash P. Shenoy,et al.  Sequential influence diagrams: A unified asymmetry framework , 2006, Int. J. Approx. Reason..

[172]  Marco Zaffalon The naive credal classifier , 2002 .

[173]  Michel Mouchart,et al.  Discussion on "Conditional independence in statistitical theory" by A.P. Dawid , 1979 .

[174]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[175]  J. Pearl On probability intervals , 1988, Int. J. Approx. Reason..

[176]  N. Wermuth,et al.  Graphical and recursive models for contingency tables , 1983 .

[177]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[178]  Probability functions on complex pedigrees , 1978 .

[179]  A. Salmerón,et al.  Importance sampling in Bayesian networks using probability trees , 2000 .

[180]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[181]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[182]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[183]  Robert G Cowell FINEX: a Probabilistic Expert System for forensic identification. , 2003, Forensic science international.

[184]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[185]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[186]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[187]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[188]  P. Cheeseman Probabilistic versus Fuzzy Reasoning , 1986 .

[189]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[190]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[191]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[192]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[193]  I. Evett,et al.  Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists , 1998 .

[194]  Eric Horvitz,et al.  Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (1996) , 2013, ArXiv.

[195]  P. Walley,et al.  A survey of concepts of independence for imprecise probabilities , 2000 .

[196]  David Lindley Scoring rules and the inevitability of probability , 1982 .

[197]  Andrés Cano,et al.  Applying Numerical Trees to Evaluate Asymmetric Decision Problems , 2003, ECSQARU.

[198]  José A. Gámez,et al.  Abductive Inference in Bayesian Networks: A Review , 2004 .

[199]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[200]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[201]  José A. Gámez,et al.  Advances in Bayesian networks , 2004 .

[202]  A. Darwiche,et al.  Complexity Results and Approximation Strategies for MAP Explanations , 2011, J. Artif. Intell. Res..

[203]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..