Efficient Parallel Computation on the Reduced Mesh of Tress Organization

Abstract Optimal and near optimal parallel algorithms for several fundamental problems are proposed for a parallel organization consistmg of n processors, each having access to a row and a column of an n × n array of memory modules. Parallel computations are implemented on such an organization by decomposing them into alternating orthogonal processing phases. A number of efficient data movement techniques are developed for the proposed organization which lead to optimal or near optimal solutions to several communication-intensive problems such as sorting, performing permutations, list ranking (data dependent parallel prefix), and problems on graphs represented by an unsorted list of n 2 edges. It is also shown that the proposed organization is capable of simulating any fixed-degree network on n 2 processors with O ( n ) loss in time, which is optimal. Finally, an enhanced organization having p processors, 1 ≤ p ≤ n 2 , and O ( n 2 ) memory locations is presented, and is shown to provide optimal speedups for adjacency-matrix based graph problems for any number of processors in the range [1, n 3/2 ].