Two geometric input transformation methods for fast online reinforcement learning with neural nets

We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node's learning behavior. We propose reducing such interferences with two efficient input transformation methods that are geometric in nature and match well the geometric property of ReLU gates. The first one is tile coding, a classic binary encoding scheme originally designed for local generalization based on the topological structure of the input space. The second one (EmECS) is a new method we introduce; it is based on geometric properties of convex sets and topological embedding of the input space into the boundary of a convex set. We discuss the behavior of the network when it operates on the transformed inputs. We also compare it experimentally with some neural nets that do not use the same input transformations, and with the classic algorithm of tile coding plus a linear function approximator, and on several online reinforcement learning tasks, we show that the neural net with tile coding or EmECS can achieve not only faster learning but also more accurate approximations. Our results strongly suggest that geometric input transformation of this type can be effective for interference reduction and takes us a step closer to fully incremental reinforcement learning with neural nets.

[1]  S. N. Balakrishnan,et al.  Neurocontrol: A literature survey , 1996 .

[2]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .

[3]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[4]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[5]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[6]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[7]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[8]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[9]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[12]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[13]  R. French Catastrophic Forgetting in Connectionist Networks: Causes, Consequences and Solutions , 1999 .

[14]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[15]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[16]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[19]  Chen-Khong Tham,et al.  Modular on-line function approximation for scaling up reinforcement learning , 1994 .

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Hyongsuk Kim,et al.  CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[22]  R.M. Dunn,et al.  Brains, behavior, and robotics , 1983, Proceedings of the IEEE.

[23]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[24]  Nitakshi Goyal,et al.  General Topology-I , 2017 .

[25]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[26]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[27]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[28]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.