Parallel exact inference on the cell broadband engine processor

We present the design and implementation of a parallel exact inference algorithm on the Cell Broadband Engine (Cell BE). Exact inference is a key problem in exploring probabilistic graphical models. In such a model, the computation complexity increases dramatically with the network structure and clique size. In this paper, we exploit parallelism at multiple levels. We present an efficient scheduler to dynamically partition large tasks and allocate synergistic processing elements (SPEs). We explore potential table representation and data layout to optimize DMA transfer between the local store and main memory. We also optimized the computation kernels. We achieved linear speedup and superior performance, compared with state-of-the-art processors such as the AMD Opteron, Intel Xeon and Pentium 4. The methodology proposed in this paper can be used for online scheduling of directed acyclic graph (DAG) structured computations.

[1]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[2]  Ming Wu,et al.  Algorithm-system scalability of heterogeneous computing , 2008, J. Parallel Distributed Comput..

[3]  Paul Avery,et al.  Policy based scheduling for simple quality of service in grid computing , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[4]  Lifeng Sun,et al.  Spatial and Temporal Data Parallelization of Multi-view Video Encoding Algorithm , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[7]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[8]  Viktor K. Prasanna,et al.  Node Level Primitives for Parallel Exact Inference , 2007 .

[9]  David A. Bader,et al.  High performance combinatorial algorithm design on the Cell Broadband Engine processor , 2007, Parallel Comput..

[10]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[11]  David A. Bader Petascale Computing for Large-Scale Graph Problems , 2007, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.

[12]  Mats Gyllenberg,et al.  Bayesian model learning based on a parallel MCMC strategy , 2006, Stat. Comput..

[13]  Fabrizio Petrini,et al.  Efficient Breadth-First Search on the Cell/BE Processor , 2008, IEEE Transactions on Parallel and Distributed Systems.

[14]  Yan Alexander Li,et al.  Minimizing the Application Execution Time Through Scheduling of Subtasks and Communication Traffic in a Heterogeneous Computing System , 1997, IEEE Trans. Parallel Distributed Syst..

[15]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[16]  Füsun Özgüner,et al.  Dynamic, competitive scheduling of multiple DAGs in a distributed heterogeneous environment , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[17]  Jaswinder Pal Singh,et al.  A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.

[18]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[19]  Cho-Li Wang,et al.  A segment-based DSM supporting large shared object space , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[20]  David A. Bader,et al.  FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine , 2007, HiPC.

[21]  Viktor K. Prasanna,et al.  Junction tree decomposition for parallel exact inference , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[22]  Dhabaleswar K. Panda,et al.  Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[23]  Alan D. George,et al.  Scheduling Tradeoffs for Heterogeneous Computing on an Advanced Space Processing Platform , 2006, ICPADS.

[24]  Viktor K. Prasanna,et al.  Scalable parallel implementation of exact inference in Bayesian networks , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[25]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[26]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[27]  Hong Shen,et al.  Fast parallel algorithm for finding the kth longest path in a tree , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[28]  Srinivas Aluru,et al.  Parallel biological sequence alignments on the Cell Broadband Engine , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[29]  Chee Keong Kwoh,et al.  Parallel DNA Sequence Alignment on the Cell Broadband Engine , 2007, PPAM.

[30]  Viktor K. Prasanna,et al.  Parallel Exact Inference , 2022 .

[31]  Alan D. George,et al.  FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications , 2007, Simul..

[32]  Ross D. Shachter,et al.  Global Conditioning for Probabilistic Inference in Belief Networks , 1994, UAI.

[33]  Debasish Ghose,et al.  Adaptive Load Distribution Strategies for Divisible Load Processing on Resource Unaware Multilevel Tree Networks , 2007, IEEE Transactions on Computers.

[34]  Olivier Brun,et al.  Parallelisation of the particle filtering technique and application to Doppler-bearing tracking of maneuvering sources , 2003, Parallel Comput..

[35]  John R. Gilbert,et al.  An empirical study of the performance and productivity of two parallel programming models , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[36]  David M. Pennock Logarithmic Time Parallel Bayesian Inference , 1998, UAI.

[37]  Yuanyuan Yang,et al.  Scheduling and performance analysis of multicast interconnects , 2007, The Journal of Supercomputing.