Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms

We investigate data parallel techniques for belief propagation in acyclic factor graphs on multi-core systems. Belief propagation is a key inference algorithm in factor graph, a probabilistic graphical model that has found applications in many domains. In this paper, we explore data parallelism for basic operations over the potential tables in belief propagation. Data parallel techniques for these table operations are developed for shared memory platforms. We then propose a complete belief propagation algorithm using these table operations to perform exact inference in factor graphs. The proposed algorithms are implemented on state-of-the-art multi-socket multi-core systems with additional NUMA-aware optimizations. Our proposed algorithms exhibit good scalability using a representative set of factor graphs. On a four-socket Intel Westmere-EX system with 40 cores, we achieve 39.5$$\times $$× speedup for the table operations and 39$$\times $$× speedup for the complete algorithm using factor graphs with large potential tables.

[1]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[2]  Rama Chellappa,et al.  Scalable Data Parallel Algorithms for Texture Synthesis and Compression using Gibbs Random Fields , 1998 .

[3]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[4]  Leonel Sousa,et al.  Massively LDPC Decoding on Multicore Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[5]  Christian Schlegel,et al.  Trellis and turbo coding , 2004 .

[6]  Joseph R. Cavallaro,et al.  Semi-parallel reconfigurable architectures for real-time LDPC decoding , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[7]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[8]  RosenblumMendel,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996 .

[9]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[10]  Tarek A. El-Ghazawi,et al.  An evaluation of global address space languages: co-array fortran and unified parallel C , 2005, PPoPP.

[11]  Laxmikant V. Kale,et al.  Accelerator Support in the Charm++ Parallel Programming Model. , 2010 .

[12]  Manish Parashar,et al.  Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications , 2009 .

[13]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[14]  H.-A. Loeliger,et al.  An introduction to factor graphs , 2004, IEEE Signal Processing Magazine.

[15]  Endika Bengoetxea,et al.  A parallel framework for loopy belief propagation , 2007, GECCO '07.

[16]  Rüdiger L. Urbanke,et al.  The capacity of low-density parity-check codes under message-passing decoding , 2001, IEEE Trans. Inf. Theory.

[17]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  William T. Freeman,et al.  Signal and Image Processing with Belief Propagation , 2008 .

[19]  Jack Dongarra,et al.  Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.

[20]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[21]  Rama Chellappa,et al.  Scalable data parallel algorithms for texture synthesis using Gibbs random fields , 1995, IEEE Trans. Image Process..

[22]  Erik Sudderth,et al.  Signal and Image Processing with Belief Propagation [DSP Applications] , 2008, IEEE Signal Processing Magazine.

[23]  Fernando Pereira,et al.  Distributed MAP Inference for Undirected Graphical Models , 2010 .

[24]  Xiaolin Li,et al.  Introduction: Enabling Large‐Scale Computational Science—Motivations, Requirements, and Challenges , 2009 .

[25]  Viktor K. Prasanna,et al.  Scalable Node-Level Computation Kernels for Parallel Exact Inference , 2010, IEEE Transactions on Computers.

[26]  Anoop Gupta,et al.  Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.

[27]  Viktor K. Prasanna,et al.  Parallel Evidence Propagation on Multicore Processors , 2009, PaCT.

[28]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[29]  David R. O'Hallaron,et al.  Distributed Parallel Inference on Large Factor Graphs , 2009, UAI.

[30]  Xiaolin Li,et al.  Advanced Computational Infrastructures for Parallel and Distributed Applications , 2009 .

[31]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[32]  Vladimir Pavlovic,et al.  Integrative Protein Function Transfer Using Factor Graphs and Heterogeneous Data Sources , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[33]  Jeffrey S. Vetter,et al.  Maestro: Data Orchestration and Tuning for OpenCL Devices , 2010, Euro-Par.

[34]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[35]  Jaswinder Pal Singh,et al.  A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.

[36]  Katharina Morik,et al.  Parallel Inference on Structured Data with CRFs on GPUs , 2012 .