论文信息 - Meta-learning for Predictive Knowledge Architectures: A Case Study Using TIDBD on a Sensor-rich Robotic Arm

Meta-learning for Predictive Knowledge Architectures: A Case Study Using TIDBD on a Sensor-rich Robotic Arm

Predictive approaches to modelling the environment have seen recent successes in robotics and other long-lived applications. These predictive knowledge architectures are learned incrementally and online, through interaction with the environment. One challenge for applications of predictive knowledge is the necessity of tuning feature representations and parameter values: no single step size will be appropriate for every prediction. Furthermore, as sensor signals might be subject to change in a non-stationary world, predefined step sizes cannot be sufficient for an autonomous agent. In this paper, we explore Temporal-Difference Incremental Delta-Bar-Delta (TIDBD)-a meta-learning method for temporal-difference (TD) learning which adapts a vector of many step sizes, allowing for simultaneous step size tuning and representation learning. We demonstrate that, for a predictive knowledge application, TIDBD is a viable alternative to tuning step-size parameters, by showing that the performance of TIDBD is comparable to that of TD with an exhaustive parameter search. Performance here is measured in terms of root mean squared difference from the true value, calculated offline. Moreover, TIDBD can perform representation learning, potentially supporting robust learning in the face of failing sensors. The ability for an autonomous agent to adapt its own learning and adjust its representation based on interactions with its environment is a key capability. With its potential to fulfill these desiderata, meta-learning is a promising component for future systems.

[1] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[2] Patrick M. Pilarski,et al. Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding , 2017, 2017 International Conference on Rehabilitation Robotics (ICORR).

[3] Patrick M. Pilarski,et al. Adaptive artificial limbs: a real-time approach to prediction and anticipation , 2013, IEEE Robotics & Automation Magazine.

[4] Patrick M. Pilarski,et al. Intelligent laser welding through representation, prediction, and control learning: An architecture with deep neural networks and reinforcement learning , 2016 .

[5] Patrick M. Pilarski,et al. Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning , 2019, ArXiv.

[6] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[7] A. Dickinson,et al. Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[8] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .

[9] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[10] Patrick M. Pilarski,et al. Predictions , Surprise , and Predictions of Surprise in General Value Function Architectures , 2018 .

[11] Guang-Hong Yang,et al. Fault detection for linear stochastic systems with sensor stuck faults , 2012 .

[12] Sergey Levine,et al. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13] Patrick M. Pilarski,et al. Introspective Agents: Confidence Measures for General Value Functions , 2016, AGI.

[14] Marco C. Bettoni,et al. Made-Up Minds: A Constructivist Approach to Artificial Intelligence , 1993, IEEE Expert.

[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16] Patrick M. Pilarski,et al. A Collaborative Approach to the Simultaneous Multi-joint Control of a Prosthetic Arm , 2015, 2015 IEEE International Conference on Rehabilitation Robotics (ICORR).