Efficient Parallel I/O in Community Atmosphere Model (CAM)

Century-long global climate simulations at high resolutions generate large amounts of data in a parallel architecture. Currently, the community atmosphere model (CAM), the atmospheric component of the NCAR community climate system model (CCSM), uses sequential I/O which causes a serious bottleneck for these simulations. We describe the parallel I/O development of CAM in this paper. The parallel I/ O combines a novel remapping of 3-D arrays with the parallel netCDF library as the I/O interface. Because CAM history variables are stored in disk file in a different index order than the one in CPU resident memory because of parallel decomposition, an index reshuffle is done on the fly. Our strategy is first to remap 3-D arrays from its native decomposition to z-decomposition on a distributed architecture, and from there write data out to disk. Because z-decomposition is consistent with the last array dimension, the data transfer can occur at maximum block sizes and, therefore, achieve maximum I/ O bandwidth. We also incorporate the recently developed parallel netCDF library at Argonne/Northwestern as the collective I/O interface, which resolves a long-standing issue because netCDF data format is extensively used in climate system models. Benchmark tests are performed on several platforms using different resolutions. We test the performance of our new parallel I/O on five platforms (SP3, SP4, SP5, Cray X1E, BlueGene/L) up to 1024 processors. More than four realistic model resolutions are examined, e.g. EUL T85 (~1.4°), FV-B (2° × 2.5°), FV-C (1° × 1.25°), and FV-D (0.5° × 0.625°) resolutions. For a standard single history output of CAM 3.1 FV-D resolution run (multiple 2-D and 3-D arrays with total size 4.1 GB), our parallel I/O speeds up by a factor of 14 on IBM SP3, compared with the existing I/O; on IBM SP5, we achieve a factor of 9 speedup. The estimated time for a typical century-long simulation of FV D-resolution on IBM SP5 shows that the I/O time can be reduced from more than 8 days (wall clock) to less than 1 day for daily output. This parallel I/O is also implemented on IBM BlueGene/ L and the results are shown, whereas the existing sequential I/O fails due to memory usage limitation.

[1]  Shahid H. Bokhari,et al.  Complete exchange on the iPSC-860 , 1991 .

[2]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[4]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[5]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[6]  Ian T. Foster,et al.  Parallel Algorithms for the Spectral Transform Method , 1997, SIAM J. Sci. Comput..

[7]  James J. Hack,et al.  Computational Design of the NCAR Community Climate Model , 1995, Parallel Comput..

[8]  W. Collins,et al.  The Formulation and Atmospheric Simulation of the Community Atmosphere Model: CAM3 , 2005 .

[9]  Donald Fraser,et al.  Array Permutation by Index-Digit Permutation , 1976, JACM.

[10]  W. Collins,et al.  The Formulation and Atmospheric Simulation of the Community Atmosphere Model Version 3 (CAM3) , 2006 .

[11]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[12]  Chris H. Q. Ding,et al.  An Optimal Index Reshuffle Algorithm for Multidimensional Arrays and Its Applications for Parallel Architectures , 2001, IEEE Trans. Parallel Distributed Syst..

[13]  Chris H. Q. Ding,et al.  Data Organization and I/O in a Parallel Ocean Circulation Model , 1999, SC.

[14]  Alan Edelman,et al.  Index Transformation Algorithms in a Linear Algebra Framework , 1994, IEEE Trans. Parallel Distributed Syst..

[15]  Bin Jia,et al.  MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Chris Ding,et al.  ZioLib: A parallel I/O library , 2003 .

[17]  Leonid Oliker,et al.  Towards Ultra-High Resolution Models of Climate and Weather , 2008, Int. J. High Perform. Comput. Appl..

[18]  Geoffrey M. Davis,et al.  NetCDF User''s Guide for C , 1997 .