Appropriateness of performance indices for imbalanced data classification: An analysis

Abstract Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. In light of these conditions, under the effect of class imbalance, we theoretically analyze four indices commonly used for evaluating binary classifiers and five popular indices for multi-class classifiers. For indices violating any of the conditions, we also suggest remedial modification and normalization. We further investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes. Simulation studies are performed on high dimensional deep representations of subset of the ImageNet dataset using four state-of-the-art classifiers tailored for handling class imbalance. Finally, based on our theoretical findings and empirical evidence, we recommend the appropriate indices that should be used to evaluate the performance of classifiers in presence of class-imbalance.

[1]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[2]  Terrance E. Boult,et al.  The Extreme Value Machine , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  ChenHsinchun,et al.  The State-of-the-Art in Twitter Sentiment Analysis , 2018 .

[5]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[6]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[8]  Joydeep Ghosh,et al.  A framework for analyzing skew in evaluation metrics , 2007, AAAI 2007.

[9]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[10]  Nikolaos M. Avouris,et al.  EVALUATION OF CLASSIFIERS FOR AN UNEVEN CLASS DISTRIBUTION PROBLEM , 2006, Appl. Artif. Intell..

[11]  N. Japkowicz Why Question Machine Learning Evaluation Methods ? ( An illustrative review of the shortcomings of current methods ) , 2006 .

[12]  Jerzy Stefanowski,et al.  Visual-based analysis of classification measures and their properties for class imbalanced problems , 2018, Inf. Sci..

[13]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Swagatam Das,et al.  Boosting with Lexicographic Programming: Addressing Class Imbalance without Cost Tuning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  M.V. Joshi,et al.  On evaluating performance of classifiers for rare classes , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Davide Ballabio,et al.  Multivariate comparison of classification performance measures , 2017 .

[20]  Jerzy Stefanowski,et al.  On the Dynamics of Classification Measures for Imbalanced and Streaming Data , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[22]  Francisco Herrera,et al.  On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed , 2014, Inf. Sci..

[23]  Swagatam Das,et al.  Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs , 2015, Neural Networks.

[24]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[25]  Swagatam Das,et al.  Multiobjective Support Vector Machines: Handling Class Imbalance With Pareto Optimality , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Liangxiao Jiang,et al.  Beyond accuracy: Learning selective Bayesian classifiers with minimal test cost , 2016, Pattern Recognit. Lett..

[28]  Eric Granger,et al.  Multiple instance learning: A survey of problem characteristics and applications , 2016, Pattern Recognit..

[29]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[30]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[31]  Bidyut Baran Chaudhuri,et al.  Handling data irregularities in classification: Foundations, trends, and future challenges , 2018, Pattern Recognit..

[32]  Haydemar Núñez,et al.  Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias , 2017, J. Classif..

[33]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[34]  Motoaki Kawanabe,et al.  Asymptotic Bayesian generalization error when training and test distributions are different , 2007, ICML '07.

[35]  Robert P. W. Duin,et al.  Precision-recall operating characteristic (P-ROC) curves in imprecise environments , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[36]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[37]  Haibo He,et al.  Assessment Metrics for Imbalanced Learning , 2013 .

[38]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[39]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[40]  Amalia Luque,et al.  The impact of class imbalance in classification performance metrics based on the binary confusion matrix , 2019, Pattern Recognit..

[41]  Nathalie Japkowicz,et al.  Assessing the Impact of Changing Environments on Classifier Performance , 2008, Canadian Conference on AI.

[42]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[43]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[44]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Guy Lapalme,et al.  Performance Measures in Classification of Human Communications , 2007, Canadian Conference on AI.