Adaptive communication algorithms for distributed heterogeneous systems

Heterogeneous network-based systems are emerging as attractive computing platforms for HPC applications. We discuss fundamental research issues that must be addressed to enable network-aware communication at the application level. We present a uniform framework for developing adaptive communication schedules for various collective communication patterns. Schedules are developed at run-time, based on network performance information obtained from a directory service. We illustrate our framework by developing communication schedules for total exchange. Our first algorithm develops a schedule by computing a series of matchings in a bipartite graph. We also present a O(P/sup 3/) heuristic algorithm, whose completion time is within twice the optimal. This algorithm is based on the open shop scheduling problem. Simulation results show performance improvements of a factor of 5 over well known homogeneous scheduling techniques.

[1]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[2]  Andrew S. Grimshaw,et al.  Legion-a view from 50,000 feet , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[3]  Salim Hariri,et al.  The software architecture of a virtual distributed computing environment , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[4]  Howard Jay Siegel,et al.  A dynamic matching and scheduling algorithm for heterogeneous computing systems , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[5]  Viktor K. Prasanna,et al.  Adaptive Communication Algorithms for Distributed Heterogeneous Systems , 1999, J. Parallel Distributed Comput..

[6]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[7]  Viktor K. Prasanna,et al.  High-performance computing for vision , 1996, Proc. IEEE.

[8]  Teofilo F. Gonzalez,et al.  Open Shop Scheduling to Minimize Finish Time , 1976, JACM.

[9]  Viktor K. Prasanna,et al.  High-Performance Com uting for Vision , 1996 .

[10]  Howard Jay Siegel,et al.  FFT Algorithms for SIMD Parallel Processing Systems , 1986, J. Parallel Distributed Comput..

[11]  Jehoshua Bruck,et al.  CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[12]  Viktor K. Prasanna,et al.  Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.

[13]  Thomas R. Gross,et al.  ReMoS: A Resource Monitoring System for Network-Aware Applications , 1997 .

[14]  David J. Lilja,et al.  Utilizing heterogeneous networks in distributed parallel computing systems , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[15]  Debra A. Hensgen,et al.  The relative performance of various mapping algorithms is independent of sizable variances in run-time predictions , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[16]  Howard Jay Siegel,et al.  A mathematical model, heuristic, and simulation study for a basic data staging problem in a heterogeneous networking environment , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[17]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[18]  David B. Shmoys,et al.  Improved approximation algorithms for shop scheduling problems , 1991, SODA '91.

[19]  David J. Lilja,et al.  Exploiting multiple heterogeneous networks to reduce communication costs in parallel programs , 1997, Proceedings Sixth Heterogeneous Computing Workshop (HCW'97).