Hierarchical Scheduling of DAG Structured Computations on Manycore Processors with Dynamic Thread Grouping

Many computational solutions can be expressed as directed acyclic graphs (DAGs) with weighted nodes. In parallel computing, scheduling such DAGs onto manycore processors remains a fundamental challenge, since synchronization across dozens of threads and preserving precedence constraints can dramatically degrade the performance. In order to improve scheduling performance on manycore processors, we propose a hierarchical scheduling method with dynamic thread grouping, which schedules DAG structured computations at three different levels. At the top level, a supermanager separates threads into groups, each consisting of a manager thread and several worker threads. The supermanager dynamically merges and partitions the groups to adapt the scheduler to the input task dependency graphs. Through group merging and partitioning, the proposed scheduler can dynamically adjust to become a centralized scheduler, a distributed scheduler or somewhere in between, depending on the input graph. At the group level, managers collaboratively schedule tasks for their workers. At the within-group level, workers perform self-scheduling within their respective groups and execute tasks. We evaluate the proposed scheduler on the Sun UltraSPARC T2 (Niagara 2) platform that supports up to 64 hardware threads. With respect to various input task dependency graphs, the proposed scheduler exhibits superior performance when compared with other various baseline methods, including typical centralized and distributed schedulers.

[1]  Yves Robert,et al.  Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems , 2009, Parallel Comput..

[2]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[3]  Guang R. Gao,et al.  Analysis and performance results of computing betweenness centrality on IBM Cyclops64 , 2009, The Journal of Supercomputing.

[4]  Andrew A. Chien,et al.  A Hierarchical Load-Balancing Framework for Dynamic Multithreaded Computations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[5]  Viktor K. Prasanna,et al.  Parallel Evidence Propagation on Multicore Processors , 2009, PaCT.

[6]  Jack Dongarra,et al.  Fully Dynamic Scheduler for Numerical Computing on Multicore Processors , 2009 .

[7]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[8]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Ishfaq Ahmad,et al.  Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[11]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[12]  Rizos Sakellariou,et al.  Scheduling multiple DAGs onto heterogeneous systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  David A. Bader High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology , 2005, WEA.

[14]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[15]  Guang R. Gao,et al.  Exploring Financial Applications on Many-Core-on-a-Chip Architecture: A First Experiment , 2006, ISPA Workshops.

[16]  Jack J. Dongarra,et al.  Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Sanjay Ranka,et al.  Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[18]  Toshio Nakatani,et al.  MPI microtask for programming the Cell Broadband EngineTM processor , 2006, IBM Syst. J..