Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts

The execution order of a block of computer instructions on a pipelined machine can make a difference in running time by a factor of two or more. Compilers use heuristic schedulers appropriate to each specific architecture implementation to achieve the best possible program speed. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler on all benchmarks tested, and the reinforcement learning scheduler outperformed the commercial scheduler on several benchmarks and performed well on the others. The combined reinforcement learning and rollout approach was also very successful. We present results of running the schedules on Compaq Alpha machines and show that the results from the simulator correspond well to the actual run-time results.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[5]  A. Harry Klopf,et al.  Advantage Updating Applied to a Differrential Game , 1994, NIPS.

[6]  Richard L. Sites,et al.  Alpha Architecture Reference Manual , 1995 .

[7]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[8]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[9]  Doina Precup,et al.  Constructive Function Approximation TITLE2 , 1997 .

[10]  Carla E. Brodley,et al.  Learning to Schedule Straight-Line Code , 1997, NIPS.

[11]  Doina Precup,et al.  Relative Value Function Approximation TITLE2 , 1997 .

[12]  John N. Tsitsiklis,et al.  Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.

[13]  Darko Stefanovic The Character of the Instruction Scheduling Problem , 1997 .

[14]  Paul E. Utgoff,et al.  Relative Value Function Approximation , 1997 .

[15]  J. Eliot B. Moss,et al.  Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts , 1998, NIPS.

[16]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[17]  Doina Precup,et al.  Constructive Function Approximation , 1998 .

[18]  Andrew G. Barto,et al.  Basic-block Instruction Scheduling Using Reinforcement Learning and Rollouts , 2002 .

[19]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[20]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.