Multiobjective genetic programming for maximizing ROC performance

In binary classification problems, receiver operating characteristic (ROC) graphs are commonly used for visualizing, organizing and selecting classifiers based on their performances. An important issue in the ROC literature is to obtain the ROC convex hull (ROCCH) that covers potentially optima for a given set of classifiers [1]. Maximizing the ROCCH means to maximize the true positive rate (tpr) and minimize the false positive rate (fpr) for every classifier in ROC space, while tpr and fpr are conflicting with each other. In this paper, we propose multiobjective genetic programming (MOGP) to obtain a group of nondominated classifiers, with which the maximum ROCCH can be achieved. Four different multiobjective frameworks, including Nondominated Sorting Genetic Algorithm II (NSGA-II), Multiobjective Evolutionary Algorithms Based on Decomposition (MOEA/D), Multiobjective selection based on dominated hypervolume (SMS-EMOA), and Approximation-Guided Evolutionary Multi-Objective (AG-EMOA) are adopted into GP, because all of them are successfully applied into many problems and have their own characters. To improve the performance of each individual in GP, we further propose a memetic approach into GP by defining two local search strategies specifically designed for classification problems. Experimental results based on 27 well-known UCI data sets show that MOGP performs significantly better than single objective algorithms such as FGP, GGP, EGP, and MGP, and other traditional machine learning algorithms such as C4.5, Naive Bayes, and PRIE. The experiments also demonstrate the efficacy of the local search operator in the MOGP framework.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Qingfu Zhang,et al.  Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II , 2009, IEEE Transactions on Evolutionary Computation.

[4]  Ken Sharman,et al.  A Genetic Programming Approach for Bankruptcy Prediction Using a Highly Unbalanced Database , 2007, EvoWorkshops.

[5]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[6]  Peter A. Flach,et al.  ROCCER: An Algorithm for Rule Learning Based on ROC Analysis , 2005, IJCAI.

[7]  Kent A. Spackman,et al.  Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning , 1989, ML.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[10]  G. Chapman,et al.  [Medical decision making]. , 1976, Lakartidningen.

[11]  John W. Backus,et al.  The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference , 1959, IFIP Congress.

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Alvaro A. Cárdenas,et al.  Optimal ROC Curve for a Combination of Classifiers , 2007, NIPS.

[14]  Xin Yao,et al.  A Memetic Genetic Programming with decision tree-based local search for classification problems , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[15]  Raymond Chiong,et al.  Novel evolutionary algorithms for supervised classification problems: an experimental study , 2011, Evol. Intell..

[16]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[17]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[18]  Markus Wagner,et al.  Approximation-Guided Evolutionary Multi-Objective Optimization , 2011, IJCAI.

[19]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[20]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[21]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[22]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[23]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[24]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[25]  R. Austria Declaration , 1987 .

[26]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[27]  Tom Fawcett PRIE: a system for generating rulelists to maximize ROC performance , 2008, Data Mining and Knowledge Discovery.

[28]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[29]  András Kocsor,et al.  ROC analysis: applications to the classification of biological sequences and 3D structures , 2008, Briefings Bioinform..

[30]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[31]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[32]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[33]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[34]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[35]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[36]  Lucien Le Cam Neyman–Pearson Lemma , 2005 .

[37]  Mahesan Niranjan,et al.  Realisable Classifiers: Improving Operating Performance on Variable Cost Problems , 1998, BMVC.

[38]  Raymond Chiong,et al.  Evolutionary Optimization: Pitfalls and Booby Traps , 2012, Journal of Computer Science and Technology.

[39]  Yang Zhang,et al.  Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering , 2008, EuroGP.

[40]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[41]  ZitzlerE.,et al.  Multiobjective evolutionary algorithms , 1999 .

[42]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[43]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[44]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[45]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.