An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, precision/recall breakeven point, average precision, lift, and calibration. Because some models (e.g. SVMs and boosted trees) do not predict well-calibrated probabilities, we compare the performance of the algorithms both before and after calibrating their predictions with Platt Scaling and Isotonic Regression. Before scaling, the models with the best overall performance are neural nets, bagged trees, and random forests. After scaling, the best models are boosted trees, random forests, and unscaled neural nets.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[5]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[8]  Robert F. Cromp,et al.  Support Vector Machine Classifiers as Applied to AVIRIS Data , 1999 .

[9]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[10]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[13]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[14]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[15]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[16]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[17]  Foster Provost,et al.  Tree Induction vs. Logistic Regression for Learning Rankings based on Likelihood of Class Membership , 2002 .

[18]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[19]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .