Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitves on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30X speedup compared with state-of-the-art multicore processors.

[1]  Viktor K. Prasanna,et al.  Node Level Primitives for Parallel Exact Inference , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[2]  Viktor K. Prasanna,et al.  Parallel Evidence Propagation on Multicore Processors , 2009, PaCT.

[3]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[4]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[5]  Jaswinder Pal Singh,et al.  A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.

[6]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[7]  Jack J. Dongarra,et al.  A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.

[8]  John D. Owens,et al.  Out‐of‐core Data Management for Path Tracing on Hybrid Resources , 2009, Comput. Graph. Forum.

[9]  Hubert Nguyen,et al.  GPU Gems 3 , 2007 .

[10]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11]  David M. Pennock Logarithmic Time Parallel Bayesian Inference , 1998, UAI.

[12]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  David A. Bader,et al.  A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[15]  K. Joy,et al.  A hybrid CPU-GPU Implementation for Interactive Ray-Tracing of Dynamic Scenes , 2008 .

[16]  Viktor K. Prasanna,et al.  Parallel exact inference on the Cell Broadband Engine processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[18]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.