Accelerating the iterative linear solver for reservoir simulation on multicore architectures

Modern petroleum reservoir simulation serves as a primary tool for quantitatively managing reservoir production and planning new fields. It involves repeatedly solving the Jacobian of a set of strong nonlinear partial differential equations governing the mass and energy conduction and conservation. Most of the existing reservoir simulators adopt iterative solver with multiple stages of preconditioners, in which the incomplete LU (ILU) factorization is an outstanding universal smoother. However, it turns out that when the degree of freedom of each grid grows, ILU usually becomes the bottleneck of the solver. Moreover, ILU is difficult to parallelize due to its inherent data dependency. In this paper, we developed a sparse iterative solver with parallelized ILU and triangular solve using block-wise data structure. Compared with the state of art iterative solver on 14 industrial reservoir simulation matrices, the proposed ILU is 5.2x faster (on average) than the state of art iterative solver because of the block-wise data structure, which leads to 2.2x speedup on the total solver runtime. In addition, parallel ILU and triangular solve are developed to further accelerate the solver. To tackle the strong data dependency in ILU and triangular solve, we first partition the algorithm into separated tasks and construct a data flow graph to represent the data dependency. Then, tasks are scheduled in parallel according to the topological order of the data flow graph. On an 8-thread multicore architecture, we achieved another 3.6x speedup on ILU factorization, and 3.3x on triangular solve with good scalability.

[1]  Hamdi A. Tchelepi,et al.  A scalable multistage linear solver for reservoir models with multisegment wells , 2013, Computational Geosciences.

[2]  Santa Clara,et al.  Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU , 2011 .

[3]  Timothy A. Davis,et al.  Algorithm 907 , 2010 .

[4]  A. DeHon,et al.  Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs , 2009, 2009 International Conference on Field-Programmable Technology.

[5]  T. Barkve,et al.  Application of a Multisegment Well Model to Simulate Flow in Advanced Wells , 1998 .

[6]  J. Gilbert,et al.  Sparse Partial Pivoting in Time Proportional to Arithmetic Operations , 1986 .

[7]  Klaus Stüben,et al.  Preconditioning for Efficiently Applying Algebraic Multigrid in Fully Implicit Reservoir Simulations , 2013, ANSS 2013.

[8]  Thomas F. Coleman,et al.  A parallel triangular solver for distributed-memory multiprocessor , 1988 .

[9]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[10]  N. Fujimoto,et al.  Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Wei Wu,et al.  FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations , 2011, ARC.

[12]  Klaus Stüben,et al.  Preconditioning for Efficiently Applying Algebraic Multigrid in Fully Implicit Reservoir Simulations , 2013, ANSS 2013.

[13]  Wolfgang Fichtner,et al.  PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation , 2001, Future Gener. Comput. Syst..

[14]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[15]  R. P. Kendall,et al.  Constrained Residual Acceleration of Conjugate Residual Methods , 1985 .

[16]  Yu Wang,et al.  NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[18]  Olaf Schenk,et al.  The effects of unsymmetric matrix permutations and scalings in semiconductor device and circuit simulation , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[20]  Hamdi A. Tchelepi,et al.  Parallel Scalable Unstructured CPR-Type Linear Solver for Reservoir Simulation , 2005 .

[21]  J. R. Wallis,et al.  Incomplete Gaussian Elimination as a Preconditioning for Generalized Conjugate Gradient Acceleration , 1983 .

[22]  Wei Wu,et al.  Exploiting Parallelism by Data Dependency Elimination: A Case Study of Circuit Simulation Algorithms , 2013, IEEE Design & Test.

[23]  Yifan Zhou,et al.  Multi-GPU Parallelization of Nested Factorization for Solving Large Linear Systems , 2013, ANSS 2013.

[24]  Wei Wu,et al.  An EScheduler-Based Data Dependence Analysis and Task Scheduling for Parallel Circuit Simulation , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.