Communication scheduling techniques for distributed heterogeneous systems

Heterogeneous network-based systems are emerging as attractive computing platforms for a variety of grand challenge applications. Workstation clusters, consisting of various node types and network links, are increasingly being used as parallel computing platforms. Wide-area computational grid architectures, such as NASA's Information Power Grid (IPG), can dynamically integrate geographically distributed computing resources into networked virtual supercomputers. The heterogeneous computing nodes in such a metacomputing system are interconnected by several types of networks such as Ethernet, ATM, and FDDI, among others. This network heterogeneity and run-time performance variations present significant challenges for efficient communication. This dissertation introduces a uniform framework for developing communication schedules for collective communication patterns in such a heterogeneous system. Our framework consists of analytical models of the heterogeneous network, abstract representations of the communication pattern, and scheduling algorithms. Schedules are adaptively developed at run-time, based on network performance information obtained from a directory service. Our analytical models represent the communication performance between a pair of nodes as the sum of latency and bandwidth components. Based on this framework, we have derived efficient communication schedules for total-exchange, cyclic redistribution, broadcast, and multicast. Our scheduling algorithms incorporate techniques from bi-partite graph matching, spanning tree algorithms, and shop scheduling theory. For the total-exchange problem, the open shop algorithm develops schedules which have a bounded completion time of at most twice the optimal. For this problem, our simulation results show performance improvements of upto a factor of 5 over previous approaches. For the cyclic redistribution problem, we have implemented the open shop algorithm on a Cray T3E. Our results show consistent performance improvements of upto 60%. Our scheduling techniques for the broadcast and multicast problems are based on spanning tree algorithms. Performance improvements of over a factor of 10 are achieved. We have proposed several research directions for future work. These include enhancements to our analytical communication model, techniques to enhance the adaptivity of our schedules, faster heuristics, and communication scheduling in the presence of QoS constraints.