The potential benefit of relevance vector machine to software effort estimation

Three key challenges faced by the task of software effort estimation (SEE) when using predictive models are: (1) in order to support decision-making, software managers should have access not only to the effort estimation given by the predictive model, but also how confident this model is in estimating a given project and how likely other effort values could be the real efforts required to develop this project, (2) SEE data is likely to contain noise, due to the participation of humans in the data collection, and this noise can hinder predictions if not catered, and (3) data collection is an expensive task, and guidelines on when new data need to be collected would be helpful for reducing the cost associated with data collection. However, even though SEE has been studied for decades and many predictors have been proposed, few methods focus on these issues. In this work, we show that relevance vector machine (RVM) is a promising predictive method for addressing these three challenges. More specifically, it explicitly handles noise, it provides probabilistic predictions of effort, and can be used to identify when the required efforts of new projects should be collected for using them as training examples. With that in mind, this work provides the first step in exploiting RVM's potential for SEE by validating both its point prediction and prediction intervals. It then explains in detail future directions in terms of how RVMs can be further exploited for addressing the above mentioned challenges. Our systematic experiments show that RVM is very competitive compared with state-of-the-art SEE approaches, being usually ranked the first or second in 7 across 11 data sets in terms of mean absolute error. We also demonstrate how RVM can be used to judge the amount of noise present in the data. In summary, we show that RVM is a very promising predictor for SEE and should be further exploited.

[1]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  Adam Trendowicz,et al.  Handling Estimation Uncertainty with Bootstrapping: Empirical Evaluation in the Context of Hybrid Prediction Methods , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[4]  Yong Hu,et al.  Systematic literature review of machine learning based software development effort estimation models , 2012, Inf. Softw. Technol..

[5]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[6]  Mark Harman,et al.  Search-Based Software Project Management , 2014, Software Project Management in a Changing World.

[7]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[8]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[9]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[10]  Ioannis Stamelos,et al.  Managing uncertainty in project portfolio cost estimation , 2001, Inf. Softw. Technol..

[11]  Tim Menzies,et al.  Active learning and effort estimation: Finding the essential content of software effort estimation data , 2013, IEEE Transactions on Software Engineering.

[12]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[13]  Lisa Werner,et al.  Principles of forecasting: A handbook for researchers and practitioners , 2002 .

[14]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[15]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[18]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[19]  Ioannis Stamelos,et al.  On the use of Bayesian belief networks for the prediction of software productivity , 2003, Inf. Softw. Technol..

[20]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[21]  Bojan Cukic,et al.  Building a second opinion: learning cross-company data , 2013, PROMISE.

[22]  Xin Yao,et al.  Ensembles and locality: Insight on improving software effort estimation , 2013, Inf. Softw. Technol..

[23]  Magne Jørgensen,et al.  Evidence-based guidelines for assessment of software development cost uncertainty , 2005, IEEE Transactions on Software Engineering.

[24]  Ioannis Stamelos,et al.  A Simulation Tool for Efficient Analogy Based Cost Estimation , 2000, Empirical Software Engineering.

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  Magne Jørgensen,et al.  An effort prediction interval approach based on the empirical distribution of previous estimation accuracy , 2003, Inf. Softw. Technol..

[27]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[28]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[29]  Emilia Mendes,et al.  Bayesian Network Models for Web Effort Prediction: A Comparative Study , 2008, IEEE Transactions on Software Engineering.

[30]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[31]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[32]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[33]  J. Scott Armstrong The Forecasting Dictionary Updated : October 23 , 2000 , .

[34]  B. Baskeles,et al.  Software effort estimation using machine learning methods , 2007, 2007 22nd international symposium on computer and information sciences.

[35]  Magne Jørgensen,et al.  Better sure than safe? Over-confidence in judgement based software development effort prediction intervals , 2004, J. Syst. Softw..

[36]  Xin Yao,et al.  How to make best use of cross-company data in software effort estimation? , 2014, ICSE.

[37]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[38]  Tim Menzies,et al.  How to Find Relevant Data for Effort Estimation? , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[39]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[40]  Bojan Cukic,et al.  Predicting more from less: Synergies of learning , 2013, 2013 2nd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE).

[41]  W. Bryc The Normal Distribution: Characterizations with Applications , 1995 .