Software Module Clustering as a Multi-Objective Search Problem

Software module clustering is the problem of automatically organizing software units into modules to improve program structure. There has been a great deal of recent interest in search-based formulations of this problem in which module boundaries are identified by automated search, guided by a fitness function that captures the twin objectives of high cohesion and low coupling in a single-objective fitness function. This paper introduces two novel multi-objective formulations of the software module clustering problem, in which several different objectives (including cohesion and coupling) are represented separately. In order to evaluate the effectiveness of the multi-objective approach, a set of experiments was performed on 17 real-world module clustering problems. The results of this empirical study provide strong evidence to support the claim that the multi-objective approach produces significantly better solutions than the existing single-objective approach.

[1]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[4]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[5]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  A. Propst Think and explain with statistics , 1988 .

[8]  L. Moses,et al.  Think and Explain with Statistics , 1988 .

[9]  James M. Bieman,et al.  Measuring Functional Cohesion , 1994, IEEE Trans. Software Eng..

[10]  Martin Shepperd,et al.  Foundations of software measurement , 1995 .

[11]  Mark Harman,et al.  Slice-based measurement of coupling , 1997 .

[12]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[13]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[14]  Glenford J. Myers,et al.  Structured Design , 1999, IBM Syst. J..

[15]  Spiros Mancoridis,et al.  Automatic clustering of software systems using a genetic algorithm , 1999, STEP '99. Proceedings Ninth International Workshop Software Technology and Engineering Practice.

[16]  Sandro Morasca,et al.  Defining and Validating Measures for Object-Based High-Level Design , 1999, IEEE Trans. Software Eng..

[17]  Spiros Mancoridis,et al.  Using Heuristic Search Techniques To Extract Design Abstractions From Source Code , 2002, GECCO.

[18]  Mark Harman,et al.  A New Representation And Crossover Operator For Search-based Optimization Of Software Modularization , 2002, GECCO.

[19]  Lionel C. Briand,et al.  Using genetic algorithms and coupling measures to devise optimal integration test orders , 2002, SEKE '02.

[20]  Mark Harman,et al.  A multiple hill climbing approach to software module clustering , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[21]  Mark Harman,et al.  Reformulating software engineering as a search problem , 2003 .

[22]  Brian S. Mitchell,et al.  A heuristic approach to solving the software clustering problem , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[23]  John A. Clark,et al.  Formulating software engineering as a search problem , 2003, IEE Proc. Softw..

[24]  Stephen R. Schach,et al.  Categorization of common coupling and its application to the maintainability of the Linux kernel , 2004, IEEE Transactions on Software Engineering.

[25]  Keith Phalp,et al.  Coupling Trends in Industrial Prototyping Roles: An Empirical Investigation , 2004, Software Quality Journal.

[26]  Mark Harman,et al.  An empirical study of the robustness of two module clustering fitness functions , 2005, GECCO '05.

[27]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[28]  Xin Yao,et al.  A New Multi-objective Evolutionary Optimisation Algorithm: The Two-Archive Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[29]  Myra B. Cohen,et al.  Clustering the heap in multi-threaded applications for improved garbage collection , 2006, GECCO.

[30]  Yuanyuan Zhang,et al.  The multi-objective next release problem , 2007, GECCO '07.

[31]  Frank Neumann,et al.  Do additional objectives make a problem harder? , 2007, GECCO '07.

[32]  Lionel C. Briand,et al.  Multi-Objective Genetic Algorithm to Support Class Responsibility Assignment , 2007, 2007 IEEE International Conference on Software Maintenance.

[33]  Mark Harman,et al.  Pareto efficient multi-objective test case selection , 2007, ISSTA '07.

[34]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).