Using Accurate Arithmetics to Improve Numerical Reproducibility and Stability in Parallel Applications

Numerical reproducibility and stability of large scale scientific simulations, especially climate modeling, on distributed memory parallel computers are becoming critical issues. In particular, global summation of distributed arrays is most susceptible to rounding errors, and their propagation and accumulation cause uncertainty in final simulation results. We analyzed several accurate summation methods and found that two methods are particularly effective to improve (ensure) reproducibility and stability: Kahan's self-compensated summation and Bailey's double-double precision summation. We provide an MPI operator MPI_SUMDD to work with MPI collective operations to ensure a scalable implementation on large number of processors. The final methods are particularly simple to adopt in practical codes: not only global summations, but also vector-vector dot products and matrix-vector or matrix-matrix operations.

[1]  Richard P. Brent,et al.  Recent technical reports , 1977, SIGA.

[2]  Franz W. Peren Arithmetic , 1903, Nature.

[3]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[4]  David H. Bailey,et al.  Algorithm 719: Multiprecision translation and execution of FORTRAN programs , 1993, TOMS.

[5]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[6]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[7]  Douglas M. Priest,et al.  Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[8]  David H. Bailey,et al.  Multiprecision Translation and Execution of Fortran Programs , 1993 .

[9]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[10]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[11]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[12]  Chris H. Q. Ding,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, SC.

[13]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[14]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[15]  Yun He,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[16]  Chris H. Q. Ding,et al.  Atmosperic Data Assimilation on Distributed-Memory Parallel Supercomputers , 1998, HPCN Europe.

[17]  S. Griffies,et al.  Tracer Conservation with an Explicit Free Surface Method for z-Coordinate Ocean Models , 2001 .

[18]  Geoffrey C. Fox,et al.  Solving problems on concurrent processors: vol. 2 , 1990 .

[19]  R. C. Malone,et al.  Parallel ocean general circulation modeling , 1992 .

[20]  Ian T. Foster,et al.  Design and Performance of a Scalable Parallel Community Climate Model , 1995, Parallel Comput..

[21]  James J. Hack,et al.  Computational Design of the NCAR Community Climate Model , 1995, Parallel Comput..

[22]  Douglas M. Priest On properties of floating point arithmetics: numerical stability and the cost of accurate computations , 1992 .

[23]  H. Q. Ding,et al.  An 18 GFLOPS parallel climate data assimilation PSAS package , 1998 .