An empirical comparison of supervised learning algorithms

A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[3]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[4]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[5]  Robert F. Cromp,et al.  Support Vector Machine Classifiers as Applied to AVIRIS Data , 1999 .

[6]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[7]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[8]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[9]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[10]  Foster Provost,et al.  Tree Induction vs. Logistic Regression for Learning Rankings based on Likelihood of Class Membership , 2002 .

[11]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[12]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[13]  Rich Caruana,et al.  Introduction to IND and recursive partitioning, version 1.0 , 1991 .

[14]  Wray L. Buntine,et al.  Introduction in IND and recursive partitioning , 1991 .

[15]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[19]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[22]  Ian Witten,et al.  Data Mining , 2000 .

[23]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[28]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.