Measuring the Stability of Results From Supervised Statistical Learning

ABSTRACT Stability is a major requirement to draw reliable conclusions when interpreting results from supervised statistical learning. In this article, we present a general framework for assessing and comparing the stability of results, which can be used in real-world statistical learning applications as well as in simulation and benchmark studies. We use the framework to show that stability is a property of both the algorithm and the data-generating process. In particular, we demonstrate that unstable algorithms (such as recursive partitioning) can produce stable results when the functional form of the relationship between the predictors and the response matches the algorithm. Typical uses of the framework in practical data analysis would be to compare the stability of results generated by different candidate algorithms for a dataset at hand or to assess the stability of algorithms in a benchmark study. Code to perform the stability analyses is provided in the form of an R package. Supplementary material for this article is available online.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[3]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[4]  Joachim M. Buhmann,et al.  Stability-Based Model Selection , 2002, NIPS.

[5]  Achim Zeileis,et al.  partykit : A Toolkit for Recursive Partytioning , 2015 .

[6]  Peter D. Turney Technical Note: Bias and the Quantification of Stability , 2004, Machine Learning.

[7]  F Dannegger,et al.  Tree stability diagnostics and some remedies for instability. , 2000, Statistics in medicine.

[8]  Jean-Michel Poggi,et al.  Influence Measures for CART Classification Trees , 2015, J. Classif..

[9]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[10]  Victoria Stodden,et al.  Reproducing Statistical Results , 2015 .

[11]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .

[12]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[13]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[14]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[15]  Gilles R. Ducharme,et al.  Computational Statistics and Data Analysis a Similarity Measure to Assess the Stability of Classification Trees , 2022 .

[16]  J. R. Quinlan Constructing Decision Trees , 1993 .

[17]  Stability , 1973 .

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Yannis Theodoridis,et al.  A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees , 2008, SDM.

[20]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[21]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[22]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[23]  W. Shannon,et al.  Combining classification trees using MLE. , 1999, Statistics in medicine.

[24]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[25]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[26]  Lei Cheng,et al.  Some new deformation formulas about variance and covariance , 2012, 2012 Proceedings of International Conference on Modelling, Identification and Control.

[27]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[28]  Peter D. Turney Technical note: Bias and the quantification of stability , 1995, Machine Learning.

[29]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[30]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[31]  Steve Weston,et al.  Foreach Parallel Adaptor for the 'parallel' Package , 2015 .

[32]  A. Hedayat,et al.  Statistical Methods in Assessing Agreement , 2002 .

[33]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[34]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[35]  Larry Wasserman,et al.  All of Statistics , 2004 .

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Achim Zeileis,et al.  Partykit: a modular toolkit for recursive partytioning in R , 2015, J. Mach. Learn. Res..

[38]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[39]  A. Zeileis,et al.  Gaining insight with recursive partitioning of generalized linear models , 2013 .

[40]  Achim Zeileis,et al.  A Toolkit for Stability Assessment of Tree-Based Learners , 2016 .

[41]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[42]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[43]  Kurt Hornik,et al.  Open-source machine learning: R meets Weka , 2009, Comput. Stat..

[44]  Mousumi Banerjee,et al.  Identifying representative trees from ensembles , 2012, Statistics in medicine.

[45]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[46]  Gabriele Soffritti,et al.  The comparison between classification trees through proximity measures , 2004, Comput. Stat. Data Anal..

[47]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[48]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[49]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[50]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[51]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .