A Tutorial on the Performance Assessment of Stochastic Multiobjective Optimizers

Aims Main goal: how to evaluate/compare the approximation sets from multiple runs of two or more stochastic multiobjective optimizers We recommend two complementary approaches: Empirical attainment function Dominance-compliant quality indicators • Applies statistical tests directly to the samples of approximation sets • Gives detailed information about how and w here performance differences occur • First, reduces each approximation set to a single value of quality • Applies statistical tests to the samples of quality values 2 3/80 Scope • Do provide: – Description of the empirical attainment function – Recommendations for quality indicators to use – Software for indicators and empirical attainment function approaches – Statistical testing procedures/software for both EAFs and indicators – Case study giving a worked example, using the software provided • Do not consider: – Number of alternative solutions found in decision space – Time or computational cost of optimizer – Test function choice – Scalability of optimizers to number of objectives / decision variables – …And many other issues! Overview • Part 1-Introduction – Aims and scope – Definitions and basics of Pareto dominance – The limitations of dominance relations • Part 2 – Methods – Special case: nondominated sorting of approximation sets – 1 st approach – empirical attainment functions – 2 nd approach – (dominance-compatible) quality indicators • Part 3 – In Practice – Software guide – (PISA framework) – Case study " And what is good, Phaedrus, and what is not good? Need we as k any one to tell us these things? " – Socr at es 6/80 B A Both But on right, A is much better But A better to " most decision-makers in most situations ". Summary of Part 1 • Pareto dominance relations extend from objective vectors to approximation sets • Dominance is scaling (linear or non-linear) independent, enabling comparison of vectors/sets even when objectives are non-commensurable • Dominance does not use or account for preference information – It cannot detect degrees of " better " (it is just a binary relation) – It cannot detect differences between non-comparable sets • Algorithms are stochastic – different sets are attained with different frequencies. How can this be measured? • Using dominance relations only to compare stochastic optimizers – a special case • Empirical attainment functions – describing the frequency distribution of attained regions • Quality indicators – reducing the dimension of approximation sets but still respecting dominance – …