Optimizing Classifier Performance Via the Wilcoxon-Mann-Whitney Statistic

Cross entropy and mean squared error are typical cost functions used to optimize classi er performance. The goal of the optimization is usually to achieve the best correct classi cation rate. However, for many two-class real-world problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not necessarily maximize the area under the ROC curve (AUC). We then consider alternative objective functions for training a classi er to maximize the AUC directly. We propose an objective function that is an approximation to the Wilcoxon-Mann-Whitney statistic, which is equivalent to AUC. The proposed objective function is di erentiable, so gradient-based methods can be used to train the classi er. After discussing the improved results of the new objective function over several UCI data sets, we apply the new objective function to real-world customer behavior prediction problems for a wireless service provider and a cable service provider, and achieve reliable and signi cant improvements in the ROC curve.

[1]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[2]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[3]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[4]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5]  Garrison W. Cottrell,et al.  Optimizing parameters in a ranked retrieval system using multi-query relevance feedback , 1994 .

[6]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[7]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[8]  Joos Vandewalle,et al.  Use of a Multi-Layer Perceptron to Predict Malignancy in Ovarian Tumors , 1997, NIPS.

[9]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[10]  Eric Johnson,et al.  Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry , 2000, IEEE Trans. Neural Networks Learn. Syst..

[11]  Michael C. Mozer,et al.  Improving prediction of customer behavior in nonstationary environments , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  Michael C. Mozer,et al.  Prodding the ROC Curve: Constrained Optimization of Classifier Performance , 2001, NIPS.