An Optimal Index Reshuffle Algorithm for Multidimensional Arrays and Its Applications for Parallel Architectures

Reshuffling elements of a multidimensional array according to an index operation traditionally requires an auxiliary buffer of the same size as the original array. We describe a new in-place algorithm using vacancy tracking cycles with minimum memory access which eliminates the buffer array and the related copy-back, speeding up the reshuffle significantly for large arrays. The algorithm can be parallelized using a multithread approach on shared-memory multiprocessor computers. On distributed-memory multiprocessor computers, the index reshuffle of distributed multidimensional arrays amounts to a remapping of processor domains and is carried out using the in-place local algorithm combined with a global exchange algorithm. Implementation and test results on CRAY T3E and IBM SP indicate the effectiveness of the algorithm.

[1]  Donald Fraser,et al.  Array Permutation by Index-Digit Permutation , 1976, JACM.

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Yun He,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[4]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[5]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[6]  Alan Edelman,et al.  Index Transformation Algorithms in a Linear Algebra Framework , 1994, IEEE Trans. Parallel Distributed Syst..

[7]  Michael F. Wehner,et al.  Efficient Filtering Techniques for Finite-Difference Atmospheric General Circulation Models on Parallel Processors , 1998, Parallel Comput..

[8]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[9]  Ian T. Foster,et al.  Design and Performance of a Scalable Parallel Community Climate Model , 1995, Parallel Comput..

[10]  James J. Hack,et al.  Computational Design of the NCAR Community Climate Model , 1995, Parallel Comput..

[11]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[12]  Shahid H. Bokhari,et al.  Complete exchange on the iPSC-860 , 1991 .

[13]  S. Lennart Johnsson,et al.  Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures , 1988, ICPP.

[14]  Chris H. Q. Ding,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, SC.