Self-Adaptive Evidence Propagation on Manycore Processors

Evidence propagation is a major step in exact inference, a key problem in exploring probabilistic graphical models. Evidence propagation is essentially a series of computations between the potential tables in cliques and separators of a given junction tree. In real applications, the size of the potential tables varies dramatically. Thus, to achieve scalability over dozens of threads remains a fundamental challenge for evidence propagation on many core processors. In this paper, we propose a self-adaptive method for evidence propagation on many core processors. Given an arbitrary junction tree, we convert evidence propagation in the junction tree into a task dependency graph. The proposed self-adaptive scheduler dynamically adjusts the number of threads for scheduling or executing tasks according to the task dependency graph. Such a self-adaptability prevents the schedulers being too idle or too busy during the scheduling process. We implemented the proposed method on the Sun UltraSPARC T2 (Niagara~2) platform that supports up to 64 hardware threads. Through a set of experiments, we show that the proposed method scales well with respect to various input junction trees and exhibits superior performance when compared with several baseline methods for evidence propagation.

[1]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[2]  Eugene Santos,et al.  Bayesian Knowledge Fusion , 2009, FLAIRS Conference.

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[5]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[6]  David A. Bader High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology , 2005, WEA.

[7]  Guang R. Gao,et al.  Analysis and performance results of computing betweenness centrality on IBM Cyclops64 , 2009, The Journal of Supercomputing.

[8]  Viktor K. Prasanna,et al.  Parallel Exact Inference on a CPU-GPGPU Heterogenous System , 2010, 2010 39th International Conference on Parallel Processing.

[9]  Viktor K. Prasanna,et al.  Node Level Primitives for Parallel Exact Inference , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[10]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[11]  Srinivas Aluru,et al.  Parallel accelerated cartesian expansions for particle dynamics simulations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[12]  David A. Bader,et al.  Multithreaded Algorithms for Processing Massive Graphs. , 2007 .

[13]  Hai Jin,et al.  A throughput maximization strategy for scheduling transaction‐intensive workflows on SwinDeW‐G , 2008, Concurr. Comput. Pract. Exp..

[14]  Rizos Sakellariou,et al.  Scheduling multiple DAGs onto heterogeneous systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[15]  Jaswinder Pal Singh,et al.  A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference , 1994, Proceedings of Supercomputing '94.

[16]  Sanjay Ranka,et al.  Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[17]  Viktor K. Prasanna,et al.  Parallel exact inference on the Cell Broadband Engine processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[19]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[20]  Viktor K. Prasanna,et al.  Parallel exact inference on the cell broadband engine processor , 2008, HiPC 2008.

[21]  Jack J. Dongarra,et al.  Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[22]  Yves Robert,et al.  Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems , 2009, Parallel Comput..

[23]  Robert W. Brodersen,et al.  Floating-point to fixed-point conversion , 2004 .

[24]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[25]  Ishfaq Ahmad,et al.  Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[26]  David M. Pennock Logarithmic Time Parallel Bayesian Inference , 1998, UAI.