Optimization-based mapping framework for parallel applications

Abstract The mapping of tasks of a parallel program onto nodes of a parallel computing system has a remarkable impact on application performance. In this paper we propose an optimization framework to solve the mapping problem, which takes into account the communication matrix of the application and a cost matrix that depends on the topology of the parallel system. This cost matrix is usually a distance matrix (the classic approach), but we propose a novel definition of the cost criterion, applicable to torus networks, that tries to distribute traffic evenly over the different axes; we call this the Traffic Distribution criterion. As the mapping problem can be seen as a particular instance of the Quadratic Assignment Problem (QAP), we can apply any QAP solver to this problem. In particular, we use a greedy randomized algorithm. Using simulation, we test the performance levels of the optimization-based mappings, and compare them with those of trivial mappings (consecutive, random), in two different environments: single application (one application uses all system resources all the time) and space sharing (several applications run simultaneously, on different system partitions), using systems with 2D and 3D topologies and real application traffic. Experimental results show that some applications do not benefit from optimization-based mappings: those in which there is a match between virtual and physical topologies, and those that carry out massive all-to-all communications. In other cases, optimization-based mappings with the TD criterion provide excellent performance levels.

[1]  Laxmikant V. Kalé,et al.  Quantifying Network Contention on Large Parallel Machines , 2009, Parallel Process. Lett..

[2]  Laxmikant V. Kalé,et al.  Benefits of Topology Aware Mapping for Mesh Interconnects , 2008, Parallel Process. Lett..

[3]  Federico Silla,et al.  On the development of a communication-aware task mapping technique , 2004, J. Syst. Archit..

[4]  Alfonsas Misevicius,et al.  A Tabu Search Algorithm for the Quadratic Assignment Problem , 2005, Comput. Optim. Appl..

[5]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[6]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Nair Maria Maia de Abreu,et al.  A survey for the quadratic assignment problem , 2007, Eur. J. Oper. Res..

[8]  Charles Fleurent,et al.  Genetic Hybrids for the Quadratic Assignment Problem , 1993, Quadratic Assignment and Related Problems.

[9]  Edwin V. Bonilla,et al.  Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  José Miguel-Alonso,et al.  INSEE: An Interconnection Network Simulation and Evaluation Environment , 2005, Euro-Par.

[11]  M. Resende,et al.  A probabilistic heuristic for a computationally difficult set covering problem , 1989 .

[12]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[13]  Javier Navaridas,et al.  Effects of Topology-Aware Allocation Policies on Scheduling Performance , 2009, JSSPP.

[14]  Virginia Mary Lo,et al.  Temporal Communication Graphs: Lamport's Process-Time Graphs Augmented for the Purpose of Mapping and Scheduling , 1992, J. Parallel Distributed Comput..

[15]  Amith R. Mamidala,et al.  MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[16]  Stefanos Kaxiras,et al.  Where replacement algorithms fail: a thorough analysis , 2010, CF '10.

[17]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[18]  Heinz Mühlenbein,et al.  New solutions to the mapping problem of parallel systems: The evolution approach , 1987, Parallel Comput..

[19]  Fernando Gehm Moraes,et al.  Exploring NoC mapping strategies: an energy and timing aware technique , 2005, Design, Automation and Test in Europe.

[20]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  Ambros Marzetta,et al.  A Dynamic-Programming Bound for the Quadratic Assignment Problem , 1999, COCOON.

[23]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[24]  Franz Rendl,et al.  QAPLIB – A Quadratic Assignment Problem Library , 1997, J. Glob. Optim..

[25]  Laxmikant V. Kalé,et al.  Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[26]  Javier Navaridas,et al.  Reducing complexity in tree-like computer interconnection networks , 2010, Parallel Comput..

[27]  Radu Marculescu,et al.  Energy-aware mapping for tile-based NoC architectures under performance constraints , 2003, ASP-DAC '03.

[28]  S. Arunkumar,et al.  Genetic algorithm based heuristics for the mapping problem , 1995, Comput. Oper. Res..

[29]  José-Ángel Gregorio,et al.  Improving the performance of large interconnection networks using congestion-control mechanisms , 2008, Perform. Evaluation.

[30]  Aad J. van der Steen,et al.  Overview of recent supercomputers , 2008 .

[31]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[32]  Shashi Kumar,et al.  A two-step genetic algorithm for mapping task graphs to a network on chip architecture , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[33]  Panos M. Pardalos,et al.  The Quadratic Assignment Problem: A Survey and Recent Developments , 1993, Quadratic Assignment and Related Problems.

[34]  Raffaele Perego,et al.  Minimizing network contention for mapping tasks onto massively parallel computers , 1995, Proceedings Euromicro Workshop on Parallel and Distributed Processing.

[35]  Javier Navaridas,et al.  Effects of Job and Task Placement on the Performance of Parallel Scientific Applications , 2008 .

[36]  Javier Navaridas,et al.  Interconnection Network Simulation Using Traces of MPI Applications , 2009, International Journal of Parallel Programming.

[37]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[38]  José Duato,et al.  A Communication-Aware Topological Mapping Technique for NoCs , 2008, Euro-Par.

[39]  Maurício Resende A Branch and Bound Algorithm for the Quadratic Assignment Problem using a Lower Bound Based on Linear Programming , 1996 .

[40]  Maurice Yarrow,et al.  New Implementations and Results for the NAS Parallel Benchmarks 2 , 1997, PPSC.

[41]  Laxmikant V. Kalé,et al.  An evaluative study on the effect of contention on message latencies in large supercomputers , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[42]  Per S. Laursen Simulated annealing for the QAP. Optimal tradeoff between simulation time and solution quality , 1993 .

[43]  Scott F. Midkiff,et al.  Heuristic Technique for Processor and Link Assignment in Multicomputers , 1991, IEEE Trans. Computers.

[44]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45]  José E. Moreira,et al.  Resource allocation and utilization in the Blue Gene/L supercomputer , 2005, IBM J. Res. Dev..

[46]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[47]  F. Glover,et al.  Handbook of Metaheuristics , 2019, International Series in Operations Research & Management Science.

[48]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[49]  P. Pardalos,et al.  State of the art in global optimization: computational methods and applications , 1996 .

[50]  Javier Navaridas,et al.  SpiNNaker: impact of traffic locality, causality and burstiness on the performance of the interconnection network , 2010, Conf. Computing Frontiers.

[51]  Laxmikant V. Kalé,et al.  Application-specific topology-aware mapping for three dimensional topologies , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[52]  Javier Navaridas,et al.  Twisted Torus Topologies for Enhanced Interconnection Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.