Significance tests or confidence intervals: which are preferable for the comparison of classifiers?
暂无分享,去创建一个
[1] F. Schmidt. Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .
[2] Robert E. McGrath,et al. Alternatives to null hypothesis significance testing. , 2011 .
[3] S. Goodman,et al. p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. , 1993, American journal of epidemiology.
[4] G. Cumming. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .
[5] Pat Langley,et al. The changing science of machine learning , 2011, Machine Learning.
[6] Chris. Drummond,et al. Machine Learning as an Experimental Science ( Revisited ) ∗ , 2006 .
[7] C. Drummond. Finding a Balance between Anarchy and Orthodoxy , 2008 .
[8] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[9] D. C. Hurst,et al. Large Sample Simultaneous Confidence Intervals for Multinomial Proportions , 1964 .
[10] Michael J. Marks,et al. The null hypothesis significance-testing debate and its implications for personality research. , 2007 .
[11] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.
[12] K. Manly,et al. Genomics, prior probability, and statistical tests of multiple hypotheses. , 2004, Genome research.
[13] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure , 1979 .
[14] Peter Dixon,et al. Why scientists valuep values , 1998 .
[15] Charles Dugas,et al. Pointwise exact bootstrap distributions of ROC curves , 2009, Machine Learning.
[16] S. Goodman. Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.
[17] S. García,et al. An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .
[18] Gavin C. Cawley,et al. On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..
[19] Maliha S. Nash,et al. Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.
[20] C Poole,et al. Low P-Values or Narrow Confidence Intervals: Which Are More Durable? , 2001, Epidemiology.
[21] Q. Mcnemar. Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.
[22] Gemma C. Garriga,et al. Permutation Tests for Studying Classifier Performance , 2009, 2009 Ninth IEEE International Conference on Data Mining.
[23] Werner Dubitzky,et al. Avoiding model selection bias in small-sample genomic datasets , 2006, Bioinform..
[24] M. J. van de Vijver,et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.
[25] Pat Langley,et al. Machine learning as an experimental science , 2004, Machine Learning.
[26] Douglas H. Johnson. The Insignificance of Statistical Significance Testing , 1999 .
[27] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[28] K J Rothman,et al. A show of confidence. , 1978, The New England journal of medicine.
[29] J. Jackson Barnette,et al. The Data Analysis Dilemma: Ban or Abandon. A Review of Null Hypothesis Significance Testing. , 1998 .
[30] Eibe Frank,et al. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.
[31] Lois Ann Colaianni,et al. Uniform Requirements for Manuscripts Submitted to Biomedical Journals , 1991, The Medical journal of Australia.
[32] Daniel H. Robinson,et al. Further Reflections on Hypothesis Testing and Editorial Policy for Primary Research Journals , 1999 .
[33] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.
[34] Nathalie Japkowicz,et al. Warning: statistical benchmarking is addictive. Kicking the habit in machine learning , 2010, J. Exp. Theor. Artif. Intell..
[35] David J. Hand,et al. Classifier Technology and the Illusion of Progress , 2006, math/0606441.
[36] G. K. Robinson. On the necessity of Bayesian inference and the construction of measures of nearness to Bayesian form , 1978 .
[37] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[38] W D Johnson,et al. Confidence intervals for differences in correlated binary proportions. , 1997, Statistics in medicine.
[39] S. Goodman. A dirty dozen: twelve p-value misconceptions. , 2008, Seminars in hematology.
[40] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[41] James O. Berger,et al. Statistical Analysis and the Illusion of Objectivity , 1988 .