Software effort estimation as a multiobjective learning problem

Ensembles of learning machines are promising for software effort estimation (SEE), but need to be tailored for this task to have their potential exploited. A key issue when creating ensembles is to produce diverse and accurate base models. Depending on how differently different performance measures behave for SEE, they could be used as a natural way of creating SEE ensembles. We propose to view SEE model creation as a multiobjective learning problem. A multiobjective evolutionary algorithm (MOEA) is used to better understand the tradeoff among different performance measures by creating SEE models through the simultaneous optimisation of these measures. We show that the performance measures behave very differently, presenting sometimes even opposite trends. They are then used as a source of diversity for creating SEE ensembles. A good tradeoff among different measures can be obtained by using an ensemble of MOEA solutions. This ensemble performs similarly or better than a model that does not consider these measures explicitly. Besides, MOEA is also flexible, allowing emphasis of a particular measure if desired. In conclusion, MOEA can be used to better understand the relationship among performance measures and has shown to be very effective in creating SEE models.

[1]  B. Baskeles,et al.  Software effort estimation using machine learning methods , 2007, 2007 22nd international symposium on computer and information sciences.

[2]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[3]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[4]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[7]  Daryl Essam,et al.  Software project effort estimation using genetic programming , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[8]  Yuan Zhao,et al.  Estimating LOC for information systems from their conceptual data models , 2006, ICSE.

[9]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[10]  Xin Yao,et al.  A principled evaluation of ensembles of learning machines for software effort estimation , 2011, Promise '11.

[11]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[12]  Xin Yao,et al.  Ensembles and locality: Insight on improving software effort estimation , 2013, Inf. Softw. Technol..

[13]  Karen T. Lum,et al.  Selecting Best Practices for Effort Estimation , 2006, IEEE Transactions on Software Engineering.

[14]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[15]  José Demisio Simões da Silva,et al.  Comparison of Artificial Neural Network and Regression Models in Software Effort Estimation , 2007, 2007 International Joint Conference on Neural Networks.

[16]  Martin Lukasiewycz,et al.  Opt4J: a modular framework for meta-heuristic optimization , 2011, GECCO '11.

[17]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[18]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[19]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[20]  R. Conradi,et al.  Effort estimation of use cases for incremental large-scale software development , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[21]  J. Periaux,et al.  Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems , 2001 .

[22]  Qinbao Song,et al.  Dealing with missing software project data , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[23]  Magne Jørgensen,et al.  The role of outcome feedback in improving the uncertainty assessment of software development effort estimates , 2008, TSEM.

[24]  R. Agarwal,et al.  Estimating software projects , 2001, SOEN.

[25]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[26]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[27]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[28]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[29]  Shane Legg,et al.  Tournament versus fitness uniform selection , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[30]  Yuan Zhao,et al.  Conceptual data model-based software size estimation for information systems , 2009, TSEM.

[31]  Gavin R. Finnie,et al.  Using Artificial Neural Networks and Function Points to Estimate 4GL Software Development Effort , 1994, Australas. J. Inf. Syst..

[32]  Xin Yao,et al.  Ensemble Learning Using Multi-Objective Evolutionary Algorithms , 2006, J. Math. Model. Algorithms.

[33]  Magne Jørgensen,et al.  The Impact of Irrelevant and Misleading Information on Software Development Effort Estimates: A Randomized Controlled Field Experiment , 2011, IEEE Transactions on Software Engineering.

[34]  Y. Zhao,et al.  Comparison of decision tree methods for finding active objects , 2007, 0708.4274.

[35]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[36]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[37]  Giovanni Denaro,et al.  ACM Transactions on Software Engineering and Methodology : Volume 22, Nomor 4, 2013 , 2014 .

[38]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[39]  Xin Yao,et al.  Performance Scaling of Multi-objective Evolutionary Algorithms , 2003, EMO.

[40]  John A. Clark,et al.  Metrics are fitness functions too , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[41]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[42]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[43]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[44]  Xin Yao,et al.  A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[45]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[46]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[47]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[48]  Doo-Hwan Bae,et al.  An empirical analysis of software effort estimation with outlier elimination , 2008, PROMISE '08.

[49]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[50]  Xin Yao,et al.  Multi-Objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems , 2010, IEEE Transactions on Reliability.

[51]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[52]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[53]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[54]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[55]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[56]  Ekrem Kocaguneli,et al.  Combining Multiple Learners Induced on Multiple Datasets for Software Effort Prediction , 2009 .

[57]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[58]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[59]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[60]  R. Hanka The Handbook of Research Synthesis , 1994 .

[61]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[62]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[63]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..