Learning to classify software defects from crowds: A novel approach

Abstract In software engineering, associating each reported defect with a category allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using standard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To circumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class information, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM's orthogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (majority voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable.

[1]  Robert G. Mays,et al.  Experiences with Defect Prevention , 1990, IBM Syst. J..

[2]  Robert B. Grady,et al.  Practical Software Metrics for Project Management and Process Improvement , 1992 .

[3]  P. K. Aditya,et al.  A Grammar Based Fault Classification Scheme and its Application to the Classification of the Errors , 1995 .

[4]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[5]  Tracy Hall,et al.  Software fault characteristics: A synthesis of the literature , 2015 .

[6]  Elaine J. Weyuker,et al.  Collecting and categorizing software error data in an industrial environment , 2018, J. Syst. Softw..

[7]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[8]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[9]  David N. Card,et al.  Learning from Our Mistakes with Defect Causal Analysis , 1999, IEEE Softw..

[10]  Iñaki Inza,et al.  Multidimensional Learning from Crowds: Usefulness and Application of Expertise Detection , 2015, Int. J. Intell. Syst..

[11]  LiGuo Huang,et al.  AutoODC: Automated generation of orthogonal defect classifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[12]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Dewayne E. Perry,et al.  Classification and evaluation of defects in a project retrospective , 2002, J. Syst. Softw..

[16]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[17]  Henrique Madeira,et al.  Definition of software fault emulation operators: a field data study , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[18]  Forrest Shull,et al.  Defect categorization: making use of a decade of widely varying historical data , 2008, ESEM '08.

[19]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[20]  Sanjay Chawla,et al.  On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance , 2013, ICML.

[21]  Victor R. Basili,et al.  Identifying domain-specific defect classes using inspections and change history , 2006, ISESE '06.

[22]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[23]  Albert Endres,et al.  An analysis of errors and their causes in system programs , 1975, IEEE Transactions on Software Engineering.

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[26]  Norm Bridge,et al.  Orthogonal Defect Classification Using Defect Data to Improve Software Development , 1998 .

[27]  Yu Zhou,et al.  Combining text mining and data mining for bug report classification , 2016, J. Softw. Evol. Process..

[28]  Stefan Wagner,et al.  Defect classification and defect types revisited , 2008, DEFECTS '08.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[31]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[32]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[33]  Nitesh V. Chawla,et al.  Learning from Imbalanced Data: Evaluation Matters , 2012 .

[34]  Christian Denger,et al.  An industrial case study of implementing and validating defect classification for process improvement and quality management , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[35]  Norman F. Schneidewind,et al.  An Experiment in Software Error Data Collection and Analysis , 1979, IEEE Transactions on Software Engineering.

[36]  Gábor Lugosi,et al.  Learning with an unreliable teacher , 1992, Pattern Recognit..

[37]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[38]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[39]  Xindong Wu,et al.  Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Knowledge and Data Engineering.

[40]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[41]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[42]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[43]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[44]  Ferdian Thung,et al.  Automatic Defect Categorization , 2012, 2012 19th Working Conference on Reverse Engineering.

[45]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[46]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[47]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[49]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[50]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[51]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[52]  Mark Butcher,et al.  Improving software testing via ODC: Three case studies , 2002, IBM Syst. J..

[53]  Alain Abran,et al.  Introducing root-cause analysis and orthogonal defect classification at lower CMMI maturity levels , 2006 .

[54]  Giovanni Cantone,et al.  Exploring feasibility of software defects orthogonal classification , 2006, ICSOFT.

[55]  Rachel Harrison,et al.  Two datasets of defect reports labeled by a crowd of annotators of unknown reliability , 2018, Data in brief.

[56]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[57]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[58]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[59]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[60]  Mehmet Söylemez,et al.  Using Process Enactment Data Analysis to Support Orthogonal Defect Classification for Software Process Improvement , 2013, 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement.

[61]  Xindong Wu,et al.  Multi-Class Ground Truth Inference in Crowdsourcing with Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[62]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[63]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.