A review of undirected and acyclic directed Gaussian Markov model selection and estimation

Markov models lie at the interface between statistical independence in a probability distribution and graph separation properties. We review model selection and estimation in directed and undirected Markov models with Gaussian parametrization, emphasizing the main similarities and differences. These two model types are foundationally similar but not equivalent, as we highlight. We report existing results with a unified notation and terminology, taking into account literature from both the artificial intelligence and statistics research communities, which first developed these models. Finally, we point out the main active research areas and open problems now existing with regard to these traditional, albeit rich, Markov models.

[1]  C.J.H. Mann,et al.  Probabilistic Conditional Independence Structures , 2005 .

[2]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[3]  Mathias Drton,et al.  A SINful approach to Gaussian graphical model selection , 2005 .

[4]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[5]  Poul Svante Eriksen Tests in covariance selection models , 1996 .

[6]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[7]  J WainwrightMartin Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso) , 2009 .

[8]  A. P. Dawid,et al.  Independence properties of directed Markov fields. Networks, 20, 491-505 , 1990 .

[9]  Min Xu,et al.  High-dimensional Covariance Estimation Based On Gaussian Graphical Models , 2010, J. Mach. Learn. Res..

[10]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[11]  Michael I. Jordan Graphical Models , 1998 .

[12]  M. Drton,et al.  Model selection for Gaussian concentration graphs , 2004 .

[13]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[14]  Søren Ladegaard Buhl On the Existence of Maximum Likelihood Estimators for Graphical Gaussian Models , 1993 .

[15]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[16]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[17]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[18]  Concha Bielza,et al.  Genetic algorithms and Gaussian Bayesian networks to uncover the predictive core set of bibliometric indices , 2016, J. Assoc. Inf. Sci. Technol..

[19]  A. Atay-Kayis,et al.  A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models , 2005 .

[20]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[21]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[22]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[23]  Charles R. Johnson,et al.  Positive definite completions of partial Hermitian matrices , 1984 .

[24]  T. W. Anderson Asymptotically Efficient Estimation of Covariance Matrices with Linear Structure , 1973 .

[25]  Dan Geiger,et al.  Graphical Models and Exponential Families , 1998, UAI.

[26]  Janeen Baxter,et al.  DEPENDENCE AND INDEPENDENCE , 1995 .

[27]  B. T. Porteous Stochastic Inequalities Relating a Class of Log-Likelihood Ratio Statistics to their Asymptotic $\chi^2$ Distribution , 1989 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[30]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[31]  N. Wermuth Linear Recursive Equations, Covariance Selection, and Path Analysis , 1980 .

[32]  S. Wright The Method of Path Coefficients , 1934 .

[33]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[34]  Fábio Gagliardi Cozman,et al.  Graphoid properties of epistemic irrelevance and independence , 2005, Annals of Mathematics and Artificial Intelligence.

[35]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[36]  A. Philip Dawid,et al.  Separoids: A Mathematical Framework for Conditional Independence and Irrelevance , 2001, Annals of Mathematics and Artificial Intelligence.

[37]  S. Geer,et al.  $\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs , 2012, 1205.5473.

[38]  Dan Geiger,et al.  On the logic of causal models , 2013, UAI.

[39]  Erich Grädel,et al.  Dependence and Independence , 2012, Stud Logica.

[40]  Caroline Uhler,et al.  Exact formulas for the normalizing constants of Wishart distributions for graphical models , 2014, 1406.4901.

[41]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[42]  M. Perlman,et al.  Normal Linear Regression Models With Recursive Graphical Markov Structure , 1998 .

[43]  Carlos M. Carvalho,et al.  FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS , 2008, 0901.3267.

[44]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[45]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[46]  Judea Pearl,et al.  GRAPHOIDS: A Graph-based logic for reasoning about relevance relations , 1985 .

[47]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[48]  Joe Whittaker,et al.  The Isserlis matrix and its application to non-decomposable graphical Gaussian models , 1998 .

[49]  A. Roverato Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models , 2002 .

[50]  N. Wermuth Analogies between Multiplicative Models in Contingency Tables and Covariance Selection , 1976 .

[51]  José M. Peña Approximate Counting of Graphical Models via MCMC Revisited , 2013, CAEPIA.

[52]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[53]  Bernhard Schölkopf,et al.  On Probabilistic Conditional Independence Structures , 2005 .

[54]  John P. Moussouris Gibbs and Markov random systems with constraints , 1974 .

[55]  A. Roverato Cholesky decomposition of a hyper inverse Wishart matrix , 2000 .

[56]  Irene Córdoba-Sánchez,et al.  Graphoids and separoids in model theory , 2016 .

[57]  W. Barrett,et al.  The real positive definite completion problem for a 4-cycle , 1993 .

[58]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[59]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[60]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[61]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[62]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[63]  M. Drton,et al.  Multiple Testing and Error Control in Gaussian Graphical Model Selection , 2005, math/0508267.

[64]  Bernd Sturmfels,et al.  Hypersurfaces and Their Singularities in Partial Correlation Testing , 2012, Found. Comput. Math..

[65]  M. Frydenberg,et al.  Decomposition of maximum likelihood in mixed graphical interaction models , 1989 .

[66]  N. Wermuth,et al.  Graphical and recursive models for contingency tables , 1983 .

[67]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[68]  Qing Zhou,et al.  Concave penalized estimation of sparse Gaussian Bayesian networks , 2014, J. Mach. Learn. Res..

[69]  S. Lauritzen,et al.  Markov properties for mixed graphs , 2011, 1109.5909.

[70]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[71]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[72]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[73]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[74]  Michael D. Perlman,et al.  The size distribution for Markov equivalence classes of acyclic digraph models , 2002, Artif. Intell..

[75]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[76]  A. Dawid Conditional Independence for Statistical Operations , 1980 .

[77]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[78]  S. Geer,et al.  Regularization in statistics , 2006 .

[79]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[80]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[81]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[82]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[83]  V. Isham An Introduction to Spatial Point Processes and Markov Random Fields , 1981 .

[84]  D. Geiger,et al.  Stratified exponential families: Graphical models and model selection , 2001 .

[85]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[86]  G. Yule On the Theory of Correlation for any Number of Variables, Treated by a New System of Notation , 1907 .

[87]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[88]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[89]  N. Meinshausen A note on the Lasso for Gaussian graphical model selection , 2008 .

[90]  M. West,et al.  Simulation of hyper-inverse Wishart distributions in graphical models , 2007 .

[91]  G. Grimmett A THEOREM ABOUT RANDOM FIELDS , 1973 .

[92]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[93]  G'erard Letac,et al.  Wishart distributions for decomposable graphs , 2007, 0708.2380.

[94]  N. Wermuth Graphical Markov models, unifying results and their interpretation , 2015, 1505.02456.

[95]  M. Frydenberg The chain graph Markov property , 1990 .

[96]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[97]  N. Wermuth,et al.  Linear Dependencies Represented by Chain Graphs , 1993 .

[98]  Bertran Steinsky Asymptotic Behaviour of the Number of Labelled Essential Acyclic Digraphs and Labelled Chain Graphs , 2004, Graphs Comb..

[99]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[100]  Carlos M. Carvalho,et al.  Simulation of Hyper-Inverse Wishart Distributions for Non-decomposable Graphs , 2010 .

[101]  Michael A. West,et al.  Archival Version including Appendicies : Experiments in Stochastic Computation for High-Dimensional Graphical Models , 2005 .

[102]  A. Dawid,et al.  Structural Markov graph laws for Bayesian model uncertainty , 2014, 1403.5689.

[103]  Caroline Uhler,et al.  Geometry of maximum likelihood estimation in Gaussian graphical models , 2010, 1012.2643.

[104]  J. Pearl,et al.  Logical and Algorithmic Properties of Conditional Independence and Graphical Models , 1993 .

[105]  Jiji Zhang,et al.  Strong Faithfulness and Uniform Consistency in Causal Inference , 2002, UAI.

[106]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[107]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.