Adaptive Communication Algorithms for Distributed Heterogeneous Systems

Many grand challenge applications can benefit from metacomputing, i.e., the coordinated use of geographically distributed heterogeneous supercomputers. A salient feature of such systems is the heterogeneity in the network performance between different processor pairs. This paper considers the problem of efficient application-level communication in heterogeneous network-based systems. We present a uniform communication scheduling framework for developing adaptive communication schedules for various collective communication patterns. The framework enables schedules to be developed at runtime, based on network performance information obtained from a directory service. Based on this framework, we have developed communication schedules for the total exchange communication pattern. Our first algorithm develops a schedule by computing a series of matchings in a bipartite graph. We also present a heuristic algorithm based on the open shop scheduling problem. The completion time of the heuristic is guaranteed to be within twice the optimal. Simulation results show performance improvements by a factor of 5 over well-known homogeneous scheduling techniques. This paper is an early effort in formalizing and solving communication problems for metacomputing systems. We discuss several research issues that must be addressed to allow efficient collective communication in such environments.

[1]  David J. Lilja,et al.  Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).

[2]  Viktor K. Prasanna,et al.  Communication scheduling techniques for distributed heterogeneous systems , 1999 .

[3]  Clifford C. Huff,et al.  Elements of a realistic CASE tool adoption budget , 1992, CACM.

[4]  Andrew S. Grimshaw,et al.  Wide-Area Computing: Resource Sharing on a Large Scale , 1999, Computer.

[5]  Viktor K. Prasanna,et al.  High-performance computing for vision , 1996, Proc. IEEE.

[6]  Teofilo F. Gonzalez,et al.  Open Shop Scheduling to Minimize Finish Time , 1976, JACM.

[7]  Viktor K. Prasanna,et al.  Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.

[8]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[9]  Howard Jay Siegel,et al.  A mathematical model, heuristic, and simulation study for a basic data staging problem in a heterogeneous networking environment , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[10]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[11]  David B. Shmoys,et al.  Improved approximation algorithms for shop scheduling problems , 1991, SODA '91.

[12]  Peter Brucker,et al.  Scheduling Algorithms , 1995 .

[13]  Salim Hariri,et al.  The software architecture of a virtual distributed computing environment , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[14]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[15]  Jehoshua Bruck,et al.  CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[16]  David J. Lilja,et al.  Utilizing heterogeneous networks in distributed parallel computing systems , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[17]  Michael W. Godfrey,et al.  An overview of MSHN: the Management System for Heterogeneous Networks , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).