Exploration of parallelism for probabilistic graphical models

Probabilistic graphical models such as Bayesian networks and junction trees are widely used to compactly represent joint probability distributions. They have found applications in a number of domains, including medical diagnosis, credit assessment, genetics, among others. The computational complexity of exact inference, a key problem in exploring probabilistic graphical models, increases dramatically with the density of the network, the clique width and the number of states of random variables. In many cases, exact inference must be performed in real time. In this work, we explore parallelism for exact inference at various granularities on state-of-the-art high performance computing platforms. We first study parallel techniques for converting an arbitrary Bayesian network into a junction tree. Then, at a fine grained level, we explore data parallelism in node level primitives for exact inference in junction trees. Based on the node level primitives, we develop computation kernels for evidence collection and distribution on both clusters and multicore processors. In addition, we propose a junction tree decomposition approach for exact inference on a cluster of processors to explore structural parallelism at a coarse grained level. To utilize structural parallelism dynamically, we also develop various schedulers for exact inference. Specifically, we develop a centralized scheduler for heterogeneous processors, a lock-free collaborative scheduler for multicore processors, and a hierarchical scheduler with dynamic thread grouping for manycore processors. The schedulers balance the workload across the cores and partition large tasks at runtime to adapt to the processor architecture. Finally, for junction trees offering limited parallelism at both data and structural levels, we propose a pointer jumping based method for exact inference to accelerate evidence propagation. We implemented our proposed methods using Pthreads andMessage Passing Interface (MPI) on various platforms, including clusters, general-purpose multicore processors, heterogeneous multicore processors, and manycore processors. Compared with various baseline algorithms using a representative set of junction trees, our proposed methods exhibit superior performance.