Distributed policy search reinforcement learning for job-shop scheduling tasks

We interpret job-shop scheduling problems as sequential decision problems that are handled by independent learning agents. These agents act completely decoupled from one another and employ probabilistic dispatching policies for which we propose a compact representation using a small set of real-valued parameters. During ongoing learning, the agents adapt these parameters using policy gradient reinforcement learning, with the aim of improving the performance of the joint policy measured in terms of a standard scheduling objective function. Moreover, we suggest a lightweight communication mechanism that enhances the agents' capabilities beyond purely reactive job dispatching. We evaluate the effectiveness of our learning approach using various deterministic as well as stochastic job-shop scheduling benchmark problems, demonstrating that the utilisation of policy gradient methods can be effective and beneficial for scheduling problems.

[1]  P. Ow,et al.  Filtered beam search in scheduling , 1988 .

[2]  Jacek Blazewicz,et al.  Scheduling in Computer and Manufacturing Systems , 1990 .

[3]  John E. Beasley,et al.  OR-Library: Distributing Test Problems by Electronic Mail , 1990 .

[4]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[5]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[6]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[7]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[8]  Katia P. Sycara,et al.  Exploiting Problem Structure for Distributed Constraint Optimization , 1995, ICMAS.

[9]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[10]  Katia P. Sycara,et al.  Coordination of multiple agents for production management , 1997, Ann. Oper. Res..

[11]  Albert D. Baker,et al.  A survey of factory control algorithms that can be implemented in a multi-agent heterarchy: Dispatching, scheduling, and pull , 1998 .

[12]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13]  P. Bartlett,et al.  Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms , 1999 .

[14]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[15]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[16]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[17]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[18]  Abhijit Gosavi,et al.  Global supply chain management: A reinforcement learning approach , 2002 .

[19]  Leonid Peshkin,et al.  Reinforcement learning for adaptive routing , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[20]  E. Spencer From the Library , 1936, British Journal of Ophthalmology.

[21]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[22]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[23]  Nong Ye,et al.  Comparison of distributed methods for resource allocation , 2005 .

[24]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[25]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[26]  Makoto Yokoo,et al.  Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[27]  J. R. Kok,et al.  Cooperation and learning in cooperative multiagent systems , 2006 .

[28]  Csaba Szepesvári,et al.  RSPSA: Enhanced Parameter Optimization in Games , 2006, ACG.

[29]  Martin A. Riedmiller,et al.  Scaling Adaptive Agent-Based Reactive Job-Shop Scheduling to Large-Scale Problems , 2007, 2007 IEEE Symposium on Computational Intelligence in Scheduling.

[30]  Olivier Buffet,et al.  Concurrent Probabilistic Temporal Planning with Policy-Gradients , 2007, ICAPS.

[31]  Chen-Khong Tham,et al.  Coordinated Reinforcement Learning for Decentralized Optimal Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[32]  Martin A. Riedmiller,et al.  Reinforcement learning for DEC-MDPs with changing action sets and partially ordered dependencies , 2008, AAMAS.

[33]  Thomas Gabel,et al.  Multi-agent reinforcement learning approaches for distributed job shop scheduling problems , 2009 .