Ray RLlib: A Framework for Distributed Reinforcement Learning

Reinforcement learning (RL) training involves the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. Current RL libraries offer parallelism at the level of the entire program, coupling all algorithm components together and making existing implementations difficult to scale, combine, and reuse. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate this principle by building RLlib on top of a task-based framework and show that we can implement a wide range of state-of-the art algorithms on top of a small set of general abstractions. These abstractions are key to composability and reuse in RLlib and do not come at the cost of performance---in our experiments, RLlib matches or exceeds the performance of highly optimized reference implementations. Ray RLlib is available as part of Ray at this https URL

[1]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[2]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[3]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[4]  Benjamin Hindman,et al.  Composing parallel software efficiently with lithe , 2010, PLDI '10.

[5]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[6]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[7]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Goetz Graefe,et al.  Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution , 1993, IEEE Trans. Software Eng..

[10]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[11]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[12]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[13]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[14]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Ameet Talwalkar,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[18]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.