Efficient collective communication in distributed heterogeneous systems

With recent advances in high-speed networks, distributed heterogeneous computing has emerged as an attractive computational paradigm. Wide-area grid infrastructures will enable distributed applications-such as video conferencing and distributed interactive simulation--to seamlessly integrate collections of heterogeneous workstations, multiprocessors, and mobile nodes. The underlying network is typically a collection of several heterogeneous links, of different networking technologies. Such a heterogeneous network is also typical in local area workstation clusters, which are increasingly being used as alternatives to parallel computing systems. This paper introduces a framework for developing efficient collective communication schedules over such heterogeneous networks. We focus on application-level communication, between processes of a parallel program. Our framework consists of analytical models of the heterogeneous system, scheduling algorithms for the collective communication pattern, and performance evaluation mechanisms. We show that previous models, which considered node heterogeneity but ignored network heterogeneity, can lead to solutions which are worse than the optimal by an unbounded factor. We then introduce an enhanced communication model, and develop three heuristic algorithms for the broadcast and multicast patterns. The completion time of the schedule is chosen as the performance metric. The heuristic algorithms are fastest edge first (FEF), earliest completing edge first (ECEF), and ECEF with look-ahead. For small system sizes, we find the optimal solution using exhaustive search. Our simulation experiments indicate that the performance of our heuristic algorithms is close to optimal. For performance evaluation of larger systems, we have also developed a simple lower bound on the completion time. Our heuristic algorithms achieve significant performance improvements over previous approaches.

[1]  Makoto Takizawa,et al.  Group communication protocol for real-time applications , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[2]  Douglas S. Reeves,et al.  The delay-constrained minimum spanning tree problem , 1997, Proceedings Second IEEE Symposium on Computer and Communications.

[3]  Viktor K. Prasanna,et al.  High-performance computing for vision , 1996, Proc. IEEE.

[4]  Rajeev Thakur,et al.  All-to-all communication on meshes with wormhole routing , 1994, Proceedings of 8th International Parallel Processing Symposium.

[5]  Andrew S. Grimshaw,et al.  Legion-a view from 50,000 feet , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[6]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[7]  Dhabaleswar K. Panda,et al.  Low Latency Message-Passing for Reflective Memory Networks , 1999, CANPC.

[8]  Dhabaleswar K. Panda,et al.  Low-latency message passing on workstation clusters using SCRAMNet , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[9]  Viktor K. Prasanna,et al.  High-Performance Com uting for Vision , 1996 .

[10]  Farnam Jahanian,et al.  Comparison of two middleware data dissemination services in a wide-area distributed system , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Scalable Comput. Pract. Exp..

[12]  Jehoshua Bruck,et al.  Efficient message passing interface (MPI) for parallel computing on clusters of workstations , 1995, SPAA '95.

[13]  Jon Crowcroft,et al.  Core Based Trees (CBT) An Architecture for Scalable Inter-Domain Multicast Routing , 1993, SIGCOMM 1993.

[14]  Howard Jay Siegel,et al.  A mathematical model, heuristic, and simulation study for a basic data staging problem in a heterogeneous networking environment , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[15]  Bruce Lowekamp,et al.  ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.

[16]  Dhabaleswar K. Panda,et al.  Efficient collective communication on heterogeneous networks of workstations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[17]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[18]  Robert E. Tarjan,et al.  Efficient algorithms for finding minimum spanning trees in undirected and directed graphs , 1986, Comb..

[19]  David J. Lilja,et al.  Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).

[20]  Xiaola Lin,et al.  Performance Evaluation of Multicast Wormhole Routing in 2D-Mesh Multicomputers , 1991, ICPP.

[21]  Kees Verstoep,et al.  Efficient reliable multicast on Myrinet , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[22]  Ian T. Foster,et al.  A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[23]  Deborah Estrin,et al.  An architecture for wide-area multicast routing , 1994, SIGCOMM 1994.

[24]  Dhabaleswar K. Panda Issues in Designing Efficient and Practical Algorithms for Collective Communication on Wormhole-Rout , 1995 .

[25]  Jehoshua Bruck,et al.  CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[26]  David J. Lilja,et al.  Utilizing heterogeneous networks in distributed parallel computing systems , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[27]  Viktor K. Prasanna,et al.  Adaptive Communication Algorithms for Distributed Heterogeneous Systems , 1999, J. Parallel Distributed Comput..

[28]  R. Prim Shortest connection networks and some generalizations , 1957 .