Making logistic regression a core data mining tool with TR-IRLS

Binary classification is a core data mining task. For large datasets or real-time applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively re-weighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary.

[1]  A. Mcintosh Fitting Linear Models: An Application of Conjugate Gradient Algorithms , 1982 .

[2]  A. Mclntosh The Conjugate Gradient Algorithm , 1982 .

[3]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[4]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[5]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[6]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[7]  J. Mark Introduction to radial basis function networks , 1996 .

[8]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[11]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[12]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[13]  R. H. Myers Generalized Linear Models: With Applications in Engineering and the Sciences , 2001 .

[14]  S. Benson A Limited Memory Variable Metri Method in Subspa es and Bound Constrained Optimization Problems , 2001 .

[15]  James E. Gentle,et al.  Elements of computational statistics , 2002 .

[16]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[17]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[18]  Andrew W. Moore,et al.  Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[19]  Jeremy Kubica,et al.  A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion , 2003 .

[20]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[21]  Alexander G. Gray,et al.  Efficient exact k-NN and nonparametric classification in high dimensions , 2003, NIPS 2003.

[22]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Thomas P. Minka,et al.  Algorithms for maximum-likelihood logistic regression , 2003 .

[25]  Andrew W. Moore,et al.  Logistic regression for data mining and high-dimensional classification , 2004 .

[26]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[27]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[28]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .