Exploring Weak Dependencies in DAG Scheduling

Many computational solutions can be expressed as directed acyclic graphs (DAGs) with weighted nodes. In parallel computing, a fundamental challenge is to efficiently map computing resources to the tasks, while preserving the precedence constraints among the tasks. Traditionally, such constraints are preserved by starting a task after all its preceding tasks are completed. However, for a class of DAG structured computations, a task can be partially executed with respect to each preceding task. We define such relationship between the tasks as weak dependency. %A typical example is exact inference in junction trees, where a clique can be partially updated with respect to any one of its preceding cliques. In this paper, we adapt a traditional DAG scheduling scheme to exploit weak dependencies in a DAG. We perform experiments to study the impact of weak dependency based scheduling method on the execution time using a representative set of task graphs for exact inference in junction trees. For a class of task graphs, on a state-of-the-art general-purpose multicore system, the weak dependency based scheduler runs 4x faster than a baseline scheduler that is based on the traditional scheduling method.

[1]  Chung-Ta King,et al.  Eager scheduling with lazy retry in multiprocessors , 2000, Future Gener. Comput. Syst..

[2]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[3]  Ishfaq Ahmad,et al.  Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[4]  Dean M. Tullsen,et al.  Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5]  Viktor K. Prasanna,et al.  Node Level Primitives for Parallel Exact Inference , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[6]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[7]  Sanjay Ranka,et al.  Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[9]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .

[10]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[11]  Guang R. Gao,et al.  Automatically Partitioning Threads for Multithreaded Architectures , 1999, J. Parallel Distributed Comput..

[12]  Toshio Nakatani,et al.  MPI microtask for programming the Cell Broadband EngineTM processor , 2006, IBM Syst. J..

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[15]  Sadaf R. Alam,et al.  Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.

[16]  Jack Dongarra,et al.  Fully Dynamic Scheduler for Numerical Computing on Multicore Processors , 2009 .

[17]  Yves Robert,et al.  Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems , 2009, Parallel Comput..

[18]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[19]  Rizos Sakellariou,et al.  Scheduling multiple DAGs onto heterogeneous systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[20]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[21]  Jack J. Dongarra,et al.  Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[22]  Andrew A. Chien,et al.  A Hierarchical Load-Balancing Framework for Dynamic Multithreaded Computations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[23]  James Reinders,et al.  Intel® threading building blocks , 2008 .