ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning

Deep reinforcement learning (DRL) has revolutionized learning and actuation in applications such as game playing and robotic control. The cost of data collection, i.e., generating transitions from agent-environment interactions, remains a major challenge for wider DRL adoption in complex real-world problems. Following a cloud-native paradigm to train DRL agents on a GPU cloud platform is a promising solution. In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels. At a high-level, ElegantRL-podracer employs a tournament-based ensemble scheme to orchestrate the training process on hundreds or even thousands of GPUs, scheduling the interactions between a leaderboard and a training pool with hundreds of pods. At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7, 000 GPU CUDA cores in a single GPU. Our ElegantRLpodracer library features high scalability, elasticity and accessibility by following the development principles of containerization, microservices and MLOps. Using an NVIDIA DGX SuperPOD cloud, we conduct extensive experiments on various tasks in locomotion and stock trading and show that ElegantRL-podracer substantially outperforms RLlib. Our codes are available on GitHub Liu et al. [2021].

[1]  David Bernstein,et al.  Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[2]  Neel Sundaresan,et al.  Cloud-Native Applications , 2017, IEEE Cloud Comput..

[3]  Qian Chen,et al.  FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance , 2020, ArXiv.

[4]  Pieter Abbeel,et al.  rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch , 2019, ArXiv.

[5]  Samuel Kounev,et al.  Elasticity in Cloud Computing: What It Is, and What It Is Not , 2013, ICAC.

[6]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[7]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Pooyan Jamshidi,et al.  Microservices Architecture Enables DevOps: Migration to a Cloud-Native Architecture , 2016, IEEE Software.

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Anwar Elwalid,et al.  FinRL-podracer: high performance and scalable deep reinforcement learning for quantitative finance , 2021, ICAIF.

[11]  Tengteng Zhang,et al.  Reinforcement learning for robot research: A comprehensive review and open issues , 2021, International Journal of Advanced Robotic Systems.

[12]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[13]  Matteo Hessel,et al.  Podracer architectures for scalable Reinforcement Learning , 2021, ArXiv.

[14]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[15]  Ronny Krashinsky,et al.  NVIDIA A100 Tensor Core GPU: Performance and Innovation , 2021, IEEE Micro.

[16]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[17]  Miles Macklin,et al.  Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning , 2021, NeurIPS Datasets and Benchmarks.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.