论文信息 - Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We formulate a novel top down approach to scheduling where, given an unknown network and a set of scheduling policies, we use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies. We derive convergence results and analyze finite time performance of the algorithm. Simulation results show that the algorithm performs well even when the arrival rates are nonstationary and can stabilize the system even when the constituent policies are unstable. Link to paper: https://arxiv.org/pdf/2102.08201.pdf

Shie Mannor | Aditya Gopalan | Mohammani Zaki | Avi Mohan

[1] Alexander L. Stolyar,et al. Scheduling for multiple flows sharing a time-varying channel: the exponential rule , 2000 .

[2] Anurag Kumar,et al. Hybrid MAC Protocols for Low-Delay Scheduling , 2016, 2016 IEEE 13th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Shie Mannor,et al. Improper Learning with Gradient-based Policy Optimization , 2021, ArXiv.

[5] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[6] Leandros Tassiulas,et al. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks , 1992 .