Application-aware metrics for partition selection in cube-shaped topologies

Abstract Non-contiguous partitioning strategies are often used to select and assign a set of nodes of a parallel computer to a particular job. The main advantage of these strategies, compared to contiguous ones, is the reduction of system fragmentation. However, without contiguity, locality in communications cannot be easily exploited, resulting in longer job execution times. Several metrics have been proposed in the literature to assess how fit a partition is to run an application on it. These metrics are computed considering the dispersion of the partition. In this paper we demonstrate that metrics based solely on dispersion are not always valid. Using simulation, we show how, for some applications, dispersion-based metrics of a partition do not correlate with the execution times of applications running on it. We define new metrics that do not only consider partition-related properties, but also application’s communication patterns and path diversity for communicating tasks. We evaluate these metrics in 2D and 3D meshes, using the NAS Parallel Benchmarks suite of applications as testing workload. A simulation-based study was carried out with a large set of partitions. Results show how metrics that include information about the traffic patterns of applications have consistent strong (and positive) correlations with execution times.

[1]  Javier Navaridas,et al.  Twisted Torus Topologies for Enhanced Interconnection Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[2]  José Antonio Lozano,et al.  Optimization-based mapping framework for parallel applications , 2011, J. Parallel Distributed Comput..

[3]  Rong Zheng,et al.  Logicalization of communication traces from parallel execution , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  José Antonio Lozano,et al.  Strategies to Map Parallel Applications onto Meshes , 2010, DCAI.

[5]  Maurice Yarrow,et al.  New Implementations and Results for the NAS Parallel Benchmarks 2 , 1997, PPSC.

[6]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[7]  Bill Nitzberg,et al.  Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[8]  Javier Navaridas,et al.  Effects of Job and Task Placement on Parallel Scientific Applications Performance , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[9]  Javier Navaridas,et al.  Simulating and evaluating interconnection networks with INSEE , 2011, Simul. Model. Pract. Theory.

[10]  José-Ángel Gregorio,et al.  Improving the performance of large interconnection networks using congestion-control mechanisms , 2008, Perform. Evaluation.

[11]  Jens Mache,et al.  Communication patterns and allocation strategies , 2004 .

[12]  Virginia Mary Lo,et al.  ProcSimity: an experimental tool for processor allocation and scheduling in highly parallel systems , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[13]  Esther M. Arkin,et al.  Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies , 2002 .