MPI and OpenMP Paradigms on Cluster of SMP Architectures: The Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition

We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence of vacancy tracking cycles allows efficient parallelization of the in-place method on SMP architectures at node level. Performance of multi-threaded parallelism using OpenMP are tested with different scheduling methods and different number of threads. The vacancy tracking method is parallelized using several parallel paradigms. At node level, pure OpenMP outperforms pure MPI by a factor of 2.76. Across entire cluster of SMP nodes, the hybrid MPI/OpenMP implementation outperforms pure MPI by a factor of 4.44, demonstrating the validity of the parallel paradigm of mixing MPI with OpenMP.

[1]  Geoffrey C. Fox,et al.  RUNTIME SUPPORT AND COMPILATION METHODS FOR USER-SPECIFIED DATE DISTRIBUTIONS , 1993 .

[2]  D. S. Henty,et al.  Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[3]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[4]  Rafael Asenjo,et al.  Automatic parallelization of irregular applications , 2000, Parallel Comput..

[5]  D. Padua,et al.  On the Automatic Parallelization of Sparse and Irregular Fortran Codes , 1997 .

[6]  Chris H. Q. Ding,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, SC.

[7]  Rudolf Eigenmann,et al.  Nonlinear and Symbolic Data Dependence Testing , 1998, IEEE Trans. Parallel Distributed Syst..

[8]  Yi Pan,et al.  Symbolic Communication Set Generation for Irregular Parallel Applications , 2003, The Journal of Supercomputing.

[9]  Yun He,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[10]  Barbara M. Chapman,et al.  OpenMP‐oriented applications for distributed shared memory architectures , 2004, Concurr. Comput. Pract. Exp..

[11]  Franck Cappello,et al.  MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[12]  Alan L. Cox,et al.  Improving Fine-Grained Irregular Shared-Memory Benchmarks by Data Reordering , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[13]  Minyi Guo,et al.  Efficient loop partitioning for parallel codes of irregular scientific computations , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[14]  Chris Ding,et al.  MPI and OpenMP Paradigms on Cluster of SMP Architectures: The Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition , 2002 .

[15]  Chris H. Q. Ding,et al.  An Optimal Index Reshuffle Algorithm for Multidimensional Arrays and Its Applications for Parallel Architectures , 2001, IEEE Trans. Parallel Distributed Syst..

[16]  Eduard Ayguadé,et al.  Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[17]  Ian T. Foster,et al.  Design and Performance of a Scalable Parallel Community Climate Model , 1995, Parallel Comput..