Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons

In exciting recent work, Bertsimas, King and Mazumder (Ann. Statist. 44 (2016) 813–852) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes than what was thought possible in the statistics community. They presented empirical comparisons of best subset with other popular variable selection procedures, in particular, the lasso and forward stepwise selection. Surprisingly (to us), their simulations suggested that best subset consistently outperformed both methods in terms of prediction accuracy. Here, we present an expanded set of simulations to shed more light on these comparisons. The summary is roughly as follows: • neither best subset nor the lasso uniformly dominate the other, with best subset generally performing better in very high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes; • for a large proportion of the settings considered, best subset and forward stepwise perform similarly, but in certain cases in the high SNR regime, best subset performs better; • forward stepwise and best subsets tend to yield sparser models (when tuned on a validation set), especially in the high SNR regime; • the relaxed lasso (actually, a simplified version of the original relaxed estimator defined in Meinshausen (Comput. Statist. Data Anal. 52 (2007) 374–393)) is the overall winner, performing just about as well as the lasso in low SNR scenarios, and nearly as well as best subset in high SNR scenarios.

[1]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[2]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[3]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[4]  Dimitris Bertsimas,et al.  Multivariate Statistics and Machine Learning Under a Modern Optimization Lens , 2015 .

[5]  Mei Yin,et al.  Phase transitions in exponential random graphs , 2011, 1108.0649.

[6]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[7]  Daniela Witten,et al.  EXACT SPIKE TRAIN INFERENCE VIA ℓ0 OPTIMIZATION. , 2017, The annals of applied statistics.

[8]  Emmanuel Lazega,et al.  Multilevel Network Analysis for the Social Sciences; Theory, Methods and Applications , 2016 .

[9]  H. Hartley,et al.  A "super-population viewpoint' for finite population sampling. , 1975, Biometrics.

[10]  S. Portnoy Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity , 1988 .

[11]  Carter T. Butts,et al.  Comparative Exploratory Analysis of Intrinsically Disordered Protein Dynamics Using Machine Learning and Network Analytic Methods , 2019, Front. Mol. Biosci..

[12]  Yves F. Atchad'e,et al.  On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods , 2013, 1306.4032.

[13]  Vishesh Karwa,et al.  DERGMs: Degeneracy-restricted exponential random graph models , 2016, ArXiv.

[14]  J. S. Hunter,et al.  Partially Replicated Latin Squares , 1955 .

[15]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[16]  C. Priebe,et al.  Universally consistent vertex classification for latent positions graphs , 2012, 1212.1182.

[17]  Stephen E. Fienberg,et al.  A Brief History of Statistical Models for Network Analysis and Open Challenges , 2012 .

[18]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[19]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[20]  Martina Morris,et al.  Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. , 2008, Journal of statistical software.

[21]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[22]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[23]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[24]  Peter D. Hoff,et al.  Fast Inference for the Latent Space Network Model Using a Case-Control Approximate Likelihood , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[25]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[26]  Antonietta Mira,et al.  Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data , 2018, Scientific Reports.

[27]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions with matching pursuit , 1994, Defense, Security, and Sensing.

[28]  J. Jonasson The random triangle model , 1999, Journal of Applied Probability.

[29]  Peter D. Hoff Random Effects Models for Network Data , 2003 .

[30]  A. P. Dawid,et al.  Likelihood and Bayesian Inference from Selectively Reported Data , 1977 .

[31]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[32]  S. Lauritzen Exchangeable Rasch Matrices∗ , 2007 .

[33]  Sylvia Richardson,et al.  High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking , 2018, Statistics and Computing.

[34]  Pavel N Krivitsky,et al.  On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry. , 2011, Statistical science : a review journal of the Institute of Mathematical Statistics.

[35]  Suman Chakraborty,et al.  Weighted Exponential Random Graph Models: Scope and Large Network Limits , 2017 .

[36]  Sumit Mukherjee,et al.  Phase transition in the two star Exponential Random Graph Model , 2013 .

[37]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[38]  Zoran Obradovic,et al.  A decoupled exponential random graph model for prediction of structure and attributes in temporal social networks , 2011, Stat. Anal. Data Min..

[39]  T. Snijders,et al.  Conditional maximum likelihood estimation under various specifications of exponential random graph models , 2002 .

[40]  Dimitris Bertsimas,et al.  Logistic Regression: From Art to Science , 2017 .

[41]  Jenine K. Harris An Introduction to Exponential Random Graph Modeling , 2013 .

[42]  Michael Salter-Townshend,et al.  Role Analysis in Networks Using Mixtures of Exponential Random Graph Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[43]  David Strauss On a general class of models for interaction , 1986 .

[44]  Emmanuel Lazega,et al.  Embeddedness as a multilevel problem: A case study in economic sociology , 2016, Soc. Networks.

[45]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[46]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[47]  P. Pattison,et al.  Conditional estimation of exponential random graph models from snowball sampling designs , 2013 .

[48]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[49]  Göran Kauermann,et al.  Stable exponential random graph models with non-parametric components for large dense networks , 2016, Soc. Networks.

[50]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[51]  Pavel N Krivitsky,et al.  Fitting Position Latent Cluster Models for Social Networks with latentnet. , 2008, Journal of statistical software.

[52]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[53]  B. Efron THE GEOMETRY OF EXPONENTIAL FAMILIES , 1978 .

[54]  Yada Zhu,et al.  Domain Adaptive Multi-Modality Neural Attention Network for Financial Forecasting , 2020, WWW.

[55]  Zhenyu Tan,et al.  The Tree Ensemble Layer: Differentiability meets Conditional Computation , 2020, ICML.

[56]  Aleksandra B. Slavkovic,et al.  Sharing social network data: differentially private estimation of exponential family random‐graph models , 2015, ArXiv.

[57]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[58]  T. Yan,et al.  Asymptotics in Undirected Random Graph Models Parameterized by the Strengths of Vertices , 2015 .

[59]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[60]  Guy Bresler,et al.  Mixing Time of Exponential Random Graphs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[61]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[62]  Peter D. Hoff,et al.  A hierarchical eigenmodel for pooled covariance estimation , 2008, 0804.0031.

[63]  M. Morris,et al.  INFERENCE FOR SOCIAL NETWORK MODELS FROM EGOCENTRICALLY SAMPLED DATA, WITH APPLICATION TO UNDERSTANDING PERSISTENT RACIAL DISPARITIES IN HIV PREVALENCE IN THE US. , 2017, The annals of applied statistics.

[64]  Yudong Chen,et al.  Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization , 2018, IEEE Signal Processing Magazine.

[65]  R. R. Hocking,et al.  Selection of the Best Subset in Regression Analysis , 1967 .

[66]  Mark S Handcock,et al.  Improving Simulation-Based Algorithms for Fitting ERGMs , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[67]  Michael Schweinberger,et al.  Consistent structure estimation of exponential-family random graph models with block structure , 2017, 1702.07801.

[68]  A. Bhattacharya,et al.  Bayes Shrinkage at GWAS scale: Convergence and Approximation Theory of a Scalable MCMC Algorithm for the Horseshoe Prior , 2017, 1705.00841.

[69]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[70]  C. Butts A Relational Event Framework for Social Action , 2010 .

[71]  Jaewoo Park,et al.  Bayesian Inference in the Presence of Intractable Normalizing Functions , 2017, Journal of the American Statistical Association.

[72]  Emmanuel Lazega,et al.  Multiplexity, generalized exchange and cooperation in organizations: a case study , 1999, Soc. Networks.

[73]  S. Pandey,et al.  What Are Degrees of Freedom , 2008 .

[74]  Tom A. B. Snijders Conditional Marginalization for Exponential Random Graph Models , 2010 .

[75]  M. Ruiz Espejo Sampling , 2013, Encyclopedic Dictionary of Archaeology.

[76]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[77]  Eric P. Xing,et al.  Discrete Temporal Models of Social Networks , 2006, SNA@ICML.

[78]  Laura M. Koehly,et al.  Multilevel models for social networks: Hierarchical Bayesian approaches to exponential random graph modeling , 2016, Soc. Networks.

[79]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[80]  Chenlei Leng,et al.  Asymptotics in directed exponential random graph models with an increasing bi-degree sequence , 2014, 1408.1156.

[81]  Alberto Caimo,et al.  Bayesian exponential random graph models with nodal random effects , 2014, Soc. Networks.

[82]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[83]  Carter T Butts,et al.  A Novel Simulation Method for Binary Discrete Exponential Families, With Application to Social Networks , 2015, The Journal of mathematical sociology.

[84]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[85]  George C. Homans Human Group , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[86]  Harrison H. Zhou,et al.  Minimax estimation with thresholding and its application to wavelet analysis , 2005, math/0504503.

[87]  Ian Fellows,et al.  Removing Phase Transitions from Gibbs Measures , 2017, AISTATS.

[88]  Emily B. Fox,et al.  Sparse graphs using exchangeable random measures , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[89]  Christian P. Robert,et al.  Bayesian computation for statistical models with intractable normalizing constants , 2008, 0804.3152.

[90]  Ian Fellows,et al.  Exponential-family Random Network Models , 2012, 1208.0121.

[91]  George E. P. Box,et al.  A CONFIDENCE REGION FOR THE SOLUTION OF A SET OF SIMULTANEOUS EQUATIONS WITH AN APPLICATION TO EXPERIMENTAL DESIGN , 1954 .

[92]  Pavel N Krivitsky,et al.  Exponential-family random graph models for valued networks. , 2011, Electronic journal of statistics.

[93]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[94]  Carter T. Butts,et al.  A dynamic process interpretation of the sparse ERGM reference model , 2018, The Journal of Mathematical Sociology.

[95]  E. Lehmann Elements of large-sample theory , 1998 .

[96]  Minas Gjoka,et al.  Estimating Subgraph Frequencies with or without Attributes from Egocentrically Sampled Data , 2015, ArXiv.

[97]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[98]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[99]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: III. Valued relations , 1999 .

[100]  Dimitris Bertsimas,et al.  Scalable holistic linear regression , 2019, Oper. Res. Lett..

[101]  Zack W. Almquist,et al.  A Flexible Parameterization for Baseline Mean Degree in Multiple-Network ERGMs , 2015, The Journal of mathematical sociology.

[102]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[103]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[104]  A. Montanari,et al.  Fundamental barriers to high-dimensional regression with convex penalties , 2019, The Annals of Statistics.

[105]  P. Diaconis,et al.  Graph limits and exchangeable random graphs , 2007, 0712.2749.

[106]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[107]  Nikolaos V. Sahinidis,et al.  A Discussion on Practical Considerations with Sparse Regression Methodologies , 2020, Statistical Science.

[108]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[109]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[110]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[111]  Paul J Laurienti,et al.  Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain*† , 2013, Statistics surveys.

[112]  J. S. Hunter,et al.  The 2 k — p Fractional Factorial Designs , 1961 .

[113]  Charles Radin,et al.  Emergent Structures in Large Networks , 2013, J. Appl. Probab..

[114]  Christoph Stadtfeld,et al.  Multilevel social spaces: The network dynamics of organizational fields , 2017, Network Science.

[115]  Martina Morris,et al.  A statnet Tutorial. , 2008, Journal of statistical software.

[116]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[117]  David Krackhardt,et al.  PREDICTING WITH NETWORKS: NONPARAMETRIC MULTIPLE REGRESSION ANALYSIS OF DYADIC DATA * , 1988 .

[118]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[119]  A. Rinaldo,et al.  Random networks, graphical models and exchangeability , 2017, 1701.08420.

[120]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[121]  P. Bearman,et al.  Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks1 , 2004, American Journal of Sociology.

[122]  Juyong Park,et al.  Solution for the properties of a clustered network. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[123]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[124]  Johan Koskinen,et al.  Using latent variables to account for heterogeneity in exponential family random graph models , 2009 .

[125]  Gareth M. James,et al.  Improved variable selection with Forward-Lasso adaptive shrinkage , 2011, 1104.3390.

[126]  Mark S. Handcock,et al.  Analysis of networks with missing data with application to the National Longitudinal Study of Adolescent Health , 2017, Journal of the Royal Statistical Society. Series C, Applied statistics.

[127]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[128]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[129]  Miranda J. Lubbers,et al.  Group composition and network structure in school classes: a multilevel application of the p∗ model , 2003, Soc. Networks.

[130]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[131]  Dimitris Bertsimas,et al.  Sparse Regression: Scalable Algorithms and Empirical Performance , 2019, Statistical Science.

[132]  Pavel N Krivitsky,et al.  Computational Statistical Methods for Social Network Models , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[133]  Dimitris Bertsimas,et al.  Sparse classification: a scalable discrete optimization perspective , 2017, Machine Learning.

[134]  Thomas Brendan Murphy,et al.  Variational Bayesian inference for the Latent Position Cluster Model for network data , 2009, Comput. Stat. Data Anal..

[135]  Thomas Brendan Murphy,et al.  Multiresolution Network Models , 2016, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[136]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[137]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[138]  T. Suesse Marginalized Exponential Random Graph Models , 2012 .

[139]  Sumit Mukherjee,et al.  Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics , 2013 .

[140]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[141]  Carter T. Butts,et al.  Multiple imputation for missing edge data: A predictive evaluation method with application to Add Health , 2016, Soc. Networks.

[142]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[143]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[144]  Wim van den Noortgate,et al.  Information seeking in secondary schools: A multilevel network approach , 2017, Soc. Networks.

[145]  J. Stuart Hunter,et al.  The 2 k—p Fractional Factorial Designs Part I , 2000, Technometrics.

[146]  W. Dempsey,et al.  A Statistical Framework for Modern Network Science , 2021 .

[147]  Fabrizio De Vico Fallani,et al.  A statistical model for brain networks inferred from large-scale electrophysiological signals , 2016, Journal of The Royal Society Interface.

[148]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[149]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[150]  B. Efron Defining the Curvature of a Statistical Problem (with Applications to Second Order Efficiency) , 1975 .

[151]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[152]  Krista Gile Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation , 2010, 1006.4837.

[153]  Tom A. B. Snijders,et al.  A comparison of various approaches to the exponential random graph model: A reanalysis of 102 student networks in school classes , 2007, Soc. Networks.

[154]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[155]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[156]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[157]  Adrian E. Raftery,et al.  Properties of latent variable network models , 2015, Network Science.

[158]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[159]  Georgia Perakis,et al.  The Impact of Linear Optimization on Promotion Planning , 2014, Oper. Res..

[160]  Zack W. Almquist,et al.  Using Radical Environmentalist Texts to Uncover Network Structure and Network Features , 2019 .

[161]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[162]  S. Goodreau,et al.  Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks* , 2009, Demography.

[163]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[164]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[165]  Robert W. Wilson,et al.  Regressions by Leaps and Bounds , 2000, Technometrics.

[166]  Paul J. Laurienti,et al.  Exponential Random Graph Modeling for Complex Brain Networks , 2010, PloS one.

[167]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[168]  Paul Grigas,et al.  A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives , 2015, ArXiv.

[169]  T. Snijders,et al.  10. Settings in Social Networks: A Measurement Model , 2003 .

[170]  Christian Borgs,et al.  Sampling perspectives on sparse exchangeable graphs , 2017, The Annals of Probability.

[171]  Martina Morris,et al.  Multilevel network data facilitate statistical inference for curved ERGMs with geometrically weighted terms , 2019, Soc. Networks.

[172]  Jing Wang,et al.  Approximate Bayesian Computation for Exponential Random Graph Models for Large Social Networks , 2014, Commun. Stat. Simul. Comput..

[173]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[174]  Pavel N. Krivitsky,et al.  Using contrastive divergence to seed Monte Carlo MLE for exponential-family random graph models , 2017, Comput. Stat. Data Anal..

[175]  Jonathan Stewart,et al.  Concentration and consistency results for canonical and curved exponential-family models of random graphs , 2017, 1702.01812.

[176]  Robert H. Berk,et al.  Consistency and Asymptotic Normality of MLE's for Exponential Models , 1972 .

[177]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[178]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[179]  R. Mazumder,et al.  Sparse regression at scale: branch-and-bound rooted in first-order optimization , 2020, Mathematical programming.

[180]  Faming Liang,et al.  A Monte Carlo Metropolis-Hastings Algorithm for Sampling from Distributions with Intractable Normalizing Constants , 2013, Neural Computation.

[181]  M. Schweinberger Instability, Sensitivity, and Degeneracy of Discrete Exponential Families , 2011, Journal of the American Statistical Association.

[182]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[183]  Anderson Y. Zhang,et al.  Minimax Rates of Community Detection in Stochastic Block Models , 2015, ArXiv.

[184]  Peng Wang,et al.  Modelling a disease-relevant contact network of people who inject drugs , 2013, Soc. Networks.

[185]  Garry Robins,et al.  Introduction to multilevel social networks , 2016, Soc. Networks.

[186]  C. Geyer,et al.  Supporting Theory and Data Analysis for "Long Range Search for Maximum Likelihood in Exponential Families" , 2011 .

[187]  Alessandro Rinaldo,et al.  Asymptotic quantization of exponential random graphs , 2013, 1311.1738.

[188]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[189]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.

[190]  Saharon Rosset,et al.  When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples , 2014 .

[191]  Johan Koskinen,et al.  Essays on Bayesian Inference for Social Networks , 2004 .

[192]  Johan H. Koskinen,et al.  Multilevel embeddedness: The case of the global fisheries governance complex , 2016, Soc. Networks.

[193]  Pavel N. Krivitsky,et al.  Exponential-family Random Graph Models for Rank-order Relational Data , 2012, 1210.0493.

[194]  Erricos John Kontoghiorghes,et al.  A branch and bound algorithm for computing the best subset regression models , 2002 .

[195]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[196]  Alberto Caimo,et al.  Bayesian model selection for exponential random graph models , 2012, Soc. Networks.

[197]  P. Holland,et al.  Holland and Leinhardt Reply: Some Evidence on the Transitivity of Positive Interpersonal Sentiment , 1972, American Journal of Sociology.

[198]  Athina Markopoulou,et al.  Towards Unbiased BFS Sampling , 2011, IEEE Journal on Selected Areas in Communications.

[199]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[200]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[201]  Cornelis J. Stam,et al.  Bayesian exponential random graph modeling of whole-brain structural networks across lifespan , 2016, NeuroImage.

[202]  Mark S Handcock,et al.  7. Respondent-Driven Sampling: An Assessment of Current Methodology , 2009, Sociological methodology.

[203]  Daniel M. Roy,et al.  Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[204]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[205]  P. Bickel,et al.  The method of moments and degree distributions for network models , 2011, 1202.5101.

[206]  Bruce A. Desmarais,et al.  Statistical Inference for Valued-Edge Networks: The Generalized Exponential Random Graph Model , 2011, PloS one.

[207]  Garry Robins,et al.  Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation , 2010 .

[208]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[209]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[210]  Mark S Handcock,et al.  Local dependence in random graph models: characterization, properties and statistical inference , 2015, Journal of the American Statistical Association.

[211]  F. Liang,et al.  Fitting Social Network Models Using Varying Truncation Stochastic Approximation MCMC Algorithm , 2013 .

[212]  Pradeep Ravikumar,et al.  Graphical models via univariate exponential family distributions , 2013, J. Mach. Learn. Res..

[213]  David Welch,et al.  A Network‐based Analysis of the 1861 Hagelloch Measles Data , 2012, Biometrics.

[214]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[215]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[216]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[217]  A. Frieze,et al.  Introduction to Random Graphs , 2016 .

[218]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[219]  Armeen Taeb,et al.  Discussion on: Sparse regression: Scalable algorithms and empirical performance & Best Subset, Forward Stepwise, or Lasso? Analysis and recommendations based on extensive comparisons , 2020 .

[220]  W. Dempsey,et al.  Edge Exchangeable Models for Interaction Networks , 2018, Journal of the American Statistical Association.

[221]  Adrian E. Raftery,et al.  Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models , 2009, Soc. Networks.

[222]  Abhimanyu Das,et al.  Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection , 2018, J. Mach. Learn. Res..

[223]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[224]  M. Newman,et al.  Solution of the two-star model of a network. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[225]  Faming Liang,et al.  Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler. , 2013, Statistics and its interface.

[226]  Alberto Caimo,et al.  Efficient computational strategies for doubly intractable problems with applications to Bayesian social networks , 2014, Stat. Comput..

[227]  Peng Wang,et al.  Univariate and multivariate models of positive and negative networks: Liking, disliking, and bully-victim relationships , 2012, Soc. Networks.

[228]  Edward I. George Modern Variable Selection in Action: Comment on the Papers by HTT and BPV , 2020 .

[229]  P. Pattison,et al.  Random graph models for temporal processes in social networks , 2001 .

[230]  Dimitris Bertsimas,et al.  Characterization of the equivalence of robustification and regularization in linear and matrix regression , 2017, Eur. J. Oper. Res..

[231]  M. Kendall,et al.  The discarding of variables in multivariate analysis. , 1967, Biometrika.

[232]  P. Pattison,et al.  9. Neighborhood-Based Models for Social Networks , 2002 .

[233]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[234]  Bruce A. Desmarais,et al.  Temporal Exponential Random Graph Models with btergm: Estimation and Bootstrap Confidence Intervals , 2018 .

[235]  Angelo Mele,et al.  A Structural Model of Dense Network Formation , 2017 .

[236]  S. Berg Snowball Sampling—I , 2006 .

[237]  Daniel M. Roy,et al.  Sampling and Estimation for (Sparse) Exchangeable Graphs , 2016, The Annals of Statistics.

[238]  Yuguo Chen,et al.  A block model for node popularity in networks with community structure , 2018 .

[239]  George E. P. Box,et al.  The 2 k — p Fractional Factorial Designs Part II. , 1961 .

[240]  Paul J. Laurienti,et al.  An exponential random graph modeling approach to creating group-based representative whole-brain connectivity networks , 2011, NeuroImage.

[241]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[242]  Pavel N Krivitsky,et al.  A separable model for dynamic networks , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[243]  P. Holland,et al.  A Method for Detecting Structure in Sociometric Data , 1970, American Journal of Sociology.

[244]  P. Bühlmann,et al.  A Look at Robustness and Stability of 1-versus 0-Regularization : Discussion of Papers by Bertsimas et al . and Hastie et al . , 2020 .

[245]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[246]  Harrison H. Zhou,et al.  Rate-optimal graphon estimation , 2014, 1410.5837.

[247]  Garry Robins,et al.  Social selection models for multilevel networks , 2016, Soc. Networks.

[248]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[249]  Minas Gjoka,et al.  Coarse-grained topology estimation via graph sampling , 2011, WOSN '12.

[250]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[251]  Jennifer Neville,et al.  Relational Learning with One Network: An Asymptotic Analysis , 2011, AISTATS.

[252]  Michael Schweinberger,et al.  hergm: Hierarchical Exponential-Family Random Graph Models , 2018 .

[253]  A. U.S.,et al.  Effective degrees of freedom : a flawed metaphor , 2015 .

[254]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[255]  Isabella Gollini,et al.  A multilayer exponential random graph modelling approach for weighted networks , 2018, Comput. Stat. Data Anal..

[256]  Martin J. Wainwright,et al.  Sparse learning via Boolean relaxations , 2015, Mathematical Programming.

[257]  Martina Morris,et al.  Adjusting for Network Size and Composition Effects in Exponential-Family Random Graph Models. , 2010, Statistical methodology.

[258]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[259]  M. McPherson An Ecology of Affiliation , 1983 .

[260]  S. Rosset,et al.  When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counter Examples from Lasso and Ridge Regression , 2013, 1311.2791.

[261]  Marc Hofmann,et al.  Efficient algorithms for computing the best subset regression models for large-scale problems , 2007, Comput. Stat. Data Anal..

[262]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[263]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[264]  Jeff T. Linderoth,et al.  Regularization vs. Relaxation: A conic optimization perspective of statistical variable selection , 2015, ArXiv.

[265]  Gianmarc Grazioli,et al.  Network-Based Classification and Modeling of Amyloid Fibrils. , 2019, The journal of physical chemistry. B.

[266]  Richard W. Kenyon,et al.  On the asymptotics of constrained exponential random graphs , 2017, J. Appl. Probab..

[267]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[268]  Thomas Brendan Murphy,et al.  Review of statistical network analysis: models, algorithms, and software , 2012, Stat. Anal. Data Min..

[269]  J. S. Hunter,et al.  Statistics for experimenters : an introduction to design, data analysis, and model building , 1979 .

[270]  A. Rinaldo,et al.  CONSISTENCY UNDER SAMPLING OF EXPONENTIAL RANDOM GRAPH MODELS. , 2011, Annals of statistics.

[271]  Carter T. Butts,et al.  Spatial Modeling of Social Networks , 2011 .

[272]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[273]  P. Zappa,et al.  The Analysis of Multilevel Networks in Organizations: Models and Empirical Tests , 2014 .

[274]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[275]  S. Janson On Edge Exchangeable Random Graphs , 2017, Journal of statistical physics.

[276]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[277]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[278]  Stephen E. Fienberg,et al.  Statistical Inference in a Directed Network Model With Covariates , 2016, Journal of the American Statistical Association.

[279]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[280]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[281]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[282]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[283]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[284]  Georgia Perakis,et al.  Scheduling Promotion Vehicles to Boost Profits , 2019, Manag. Sci..

[285]  Weijun Xie,et al.  Scalable Algorithms for the Sparse Ridge Regression , 2018, SIAM J. Optim..

[286]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[287]  T. Snijders,et al.  p2: a random effects model with covariates for directed graphs , 2004 .

[288]  S. Stigler Gauss and the Invention of Least Squares , 1981 .

[289]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[290]  Daniel M. Roy,et al.  The Class of Random Graphs Arising from Exchangeable Random Measures , 2015, ArXiv.

[291]  Walter Willinger,et al.  Mathematics and the Internet: A Source of Enormous Confusion and Great Potential , 2009, The Best Writing on Mathematics 2010.

[292]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[293]  Tian Zheng,et al.  GLMLE: graph-limit enabled fast computation for fitting exponential random graph models to large social networks , 2015, Social Network Analysis and Mining.

[294]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[295]  Carter T. Butts,et al.  A perfect sampling method for exponential family random graph models , 2017, ArXiv.

[296]  Stephen A. Smith,et al.  Clearance Pricing and Inventory Policies for Retail Chains , 1998 .

[297]  Nicolai Meinshausen,et al.  Relaxed Lasso , 2007, Comput. Stat. Data Anal..

[298]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[299]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[300]  Pavel N Krivitsky,et al.  Exponential-Family Random Graph Models for Multi-Layer Networks. , 2020, Psychometrika.

[301]  Peter D Hoff,et al.  Testing and Modeling Dependencies Between a Network and Nodal Attributes , 2013, Journal of the American Statistical Association.

[302]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[303]  M.A.J. van Duijn Estimation of a Random Effects Model for Directed Graphs. , 1995 .

[304]  Yuguo Chen,et al.  Latent Space Models for Dynamic Networks , 2015, 2005.08808.

[305]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[306]  M. Talagrand A new look at independence , 1996 .

[307]  Rae. Z.H. Aliyev,et al.  Interpolation of Spatial Data , 2018, Biomedical Journal of Scientific & Technical Research.

[308]  David Gamarnik,et al.  High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition , 2017, COLT.

[309]  Peng Wang,et al.  Closure, connectivity and degree distributions: Exponential random graph (p*) models for directed social networks , 2009, Soc. Networks.

[310]  S. Mukherjee,et al.  DETECTION THRESHOLDS FOR THE β-MODEL ON SPARSE GRAPHS , 2017 .

[311]  David C. Miller,et al.  Learning surrogate models for simulation‐based optimization , 2014 .

[312]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[313]  Vishesh Karwa,et al.  Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs , 2012, 1205.4697.

[314]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[315]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[316]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[317]  SpencerJoel,et al.  The degree sequence of a scale-free random graph process , 2001 .

[318]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[319]  Hong Qin,et al.  Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters , 2013, J. Multivar. Anal..

[320]  Nikolaos V. Sahinidis,et al.  A combined first-principles and data-driven approach to model building , 2015, Comput. Chem. Eng..

[321]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[322]  J. S. Hunter,et al.  Multi-Factor Experimental Designs for Exploring Response Surfaces , 1957 .

[323]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[324]  Richard G. Everitt,et al.  Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks , 2012, ArXiv.

[325]  Neha Gondal,et al.  Duality of departmental specializations and PhD exchange: A Weberian analysis of status in interaction using multilevel exponential random graph models (mERGM) , 2018, Soc. Networks.

[326]  Faming Liang,et al.  An Adaptive Exchange Algorithm for Sampling From Distributions With Intractable Normalizing Constants , 2016 .

[327]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[328]  A. Rinaldo,et al.  Estimation for Dyadic-Dependent Exponential Random Graph Models , 2014 .

[329]  Liam Paninski,et al.  Fast online deconvolution of calcium imaging data , 2016, PLoS Comput. Biol..

[330]  Padhraic Smyth,et al.  Learning with Blocks: Composite Likelihood and Contrastive Divergence , 2010, AISTATS.

[331]  Minas Gjoka,et al.  Estimating clique composition and size distributions from sampled network data , 2013, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[332]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[333]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[334]  N. Meinshausen,et al.  Anchor regression: Heterogeneous data meet causality , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[335]  Trevor Campbell,et al.  Edge-exchangeable graphs and sparsity , 2016, NIPS.

[336]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[337]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[338]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[339]  Alper Atamtürk,et al.  Rank-one Convexification for Sparse Regression , 2019, ArXiv.

[340]  Hongyu Zhao,et al.  Network Clustering Analysis Using Mixture Exponential-Family Random Graph Models and Its Application in Genetic Interaction Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[341]  Peng Wang,et al.  Exponential random graph models for multilevel networks , 2013, Soc. Networks.