Languages and Compilers for Parallel Computing

Although there has been some experimentation with Java as a language for numerically intensive computing, there is a perception by many that the language is not suited for such work. In this paper we show how optimizing array bounds checks and null pointer checks creates loop nests on which aggressive optimizations can be used. Applying these optimizations by hand to a simple matrix-multiply test case leads to Java compliant programs whose performance is in excess of 500 Mflops on an RS/6000 SP 332MHz SMP node. We also report in this paper the effect that each optimization has on performance. Since all of these optimizations can be automated, we conclude that Java will soon be a serious contender for numerically intensive computing.

[1]  Vivek Sarkar Loop Transformations for Hierarchical Parallelism and Locality , 1998, LCR.

[2]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[3]  Geoffrey C. Fox,et al.  Interpreting the performance of HPF/Fortran 90D , 1994, Proceedings of Supercomputing '94.

[4]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[5]  John Randal Allen,et al.  Dependence analysis for subscripted variables and its application to program transformations , 1983 .

[6]  Vivek Sarkar,et al.  Optimization of array accesses by collective loop transformations , 1991, ICS '91.

[7]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[8]  Vivek Sarkar,et al.  Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..

[9]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[10]  Vivek Sarkar,et al.  Automatic parallelization for symmetric shared-memory multiprocessors , 1996, CASCON.

[11]  PeiZong Lee,et al.  Compiling Efficient Programs for Tightly-Coupled Distributed Memory Computers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[12]  Vivek Sarkar,et al.  Optimal weighted loop fusion for parallel programs , 1997, SPAA '97.

[13]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[14]  Prithviraj Banerjee,et al.  Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers , 1995, LCPC.

[15]  William Pugh,et al.  Minimizing communication while preserving parallelism , 1996, ICS '96.

[16]  Santosh G. Abraham,et al.  Compiler techniques for data partitioning of sequentially iterated parallel loops , 1990, ICS '90.

[17]  Geoffrey C. Fox,et al.  Java as a Language for Scientific Parallel Programming , 1997, LCPC.

[18]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[19]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[20]  Michael Metcalf,et al.  Fortran 90 Explained , 1990 .

[21]  J. Ramanujam,et al.  A methodology for parallelizing programs for multicomputers and complex memory multiprocessors , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[22]  Ken Kennedy,et al.  Automatic data layout for distributed-memory machines , 1998, TOPL.

[23]  Prithviraj Banerjee,et al.  Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers , 1996 .

[24]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[25]  Guang R. Gao,et al.  Automatic Data and Computation Decomposition for Distributed-Memory Machines , 1995, Parallel Process. Lett..

[26]  Alan Jay Smith,et al.  Performance Characterization of Optimizing Compilers , 1992, IEEE Trans. Software Eng..

[27]  Ko-Yang Wang Precise compile-time performance prediction for superscalar-based computers , 1994, PLDI '94.

[28]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[29]  John Paul Shen,et al.  Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.