An Efficient Algorithm for Out-of-Core Matrix Transposition

Efficient transposition of out-of-core matrices has been widely studied. These efforts have focused on reducing the number of I/O operations. However, in state-of-the-art architectures, the memory-memory data transfer time and the index computation time are also significant components of the overall time. In this paper, we propose an algorithm that considers the index computation time and the I/O time and reduces the overall execution time. Our algorithm reduces the total execution time by reducing the number of I/O operations and eliminating the index computation. In doing so, two techniques are employed: writing the data on to disk in pre-defined patterns and balancing the number of disk read and write operations. The index computation time, which is an expensive operation involving two divisions and a multiplication, is eliminated by partitioning the memory into read and write buffers. The expensive in-processor permutation is replaced by data collection from the read buffer to the write buffer. Even though this partitioning may increase the number of I/O operations for some cases, it results in an overall reduction in the execution time due to the elimination of the expensive index computation. Our algorithm is analyzed using the well-known linear model and the parallel disk model. The experimental results on a Sun Enterprise, an SGI R12000 and a Pentium III show that our algorithm reduces the overall execution time by up to 50% compared with the best known algorithms in the literature.

[1]  Zhiwei Xu,et al.  Scalable parallel computers for real-time signal processing , 1996, IEEE Signal Process. Mag..

[2]  Peter J. Varman,et al.  Optimal Read-Once Parallel Disk Scheduling , 1999, IOPADS '99.

[3]  Viktor K. Prasanna,et al.  Efficient Algorithms for Block-Cyclic Redistribution of Arrays , 1999, Algorithmica.

[4]  John C. Shepherdson,et al.  Computability of Recursive Functions , 1963, JACM.

[5]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[6]  Thomas H. Cormen,et al.  Asymptotically tight bounds for performing BMMC permutations on parallel disk systems , 1993, SPAA '93.

[7]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[8]  Hampapuram K. Ramapriyan,et al.  A Generalization of Eklundh's Algorithm for Transposing Large Matrices , 1975, IEEE Transactions on Computers.

[9]  Thomas H. Cormen,et al.  Early Experiences in Evaluating the Parallel Disk Model with the ViC* Implementation , 1996, Parallel Comput..

[10]  P. Sadayappan,et al.  Efficient transposition algorithms for large matrices , 1993, Supercomputing '93.

[11]  Robert W. Floyd,et al.  Permuting Information in Idealized Two-Level Storage , 1972, Complexity of Computer Computations.

[12]  Ari,et al.  On Transposing Large 2 n x 2 n Matrices , 1979 .

[13]  Viktor K. Prasanna,et al.  A Mapping Methodology for Designing Software Task Pipelines for Embedded Signal Processing , 1998, IPPS/SPDP Workshops.

[14]  Larry Carter,et al.  Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.

[15]  Thomas H. Cormen,et al.  Virtual memory for data-parallel computing , 1993 .

[16]  M. Ekstrom,et al.  Multidimensional signal processing , 1982 .

[17]  Giovanni L. Sicuranza,et al.  A Method for Transposing Externally Stored Matrices , 1974, IEEE Transactions on Computers.

[18]  Peter J. Varman,et al.  An improved parallel disk scheduling algorithm , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[19]  Richard A. Games Benchmarking Methodology for Real-Time Embedded Scalable High Performance Computing. , 1996 .

[20]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[21]  S. VitterJ.,et al.  Algorithms for parallel memory, I , 1994 .

[22]  Viktor K. Prasanna,et al.  Portable Implementation of Real-Time Signal Processing Benchmarks on HPC Platforms , 1998, PARA.

[23]  J. O. Eklundh,et al.  A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.