An Anti-hebbian Learning Rule to Represent Drive Motivations for Reinforcement Learning

We present a motivational system for an agent undergoing reinforcement learning (RL), which enables it to balance multiple drives, each of which is satiated by different types of stimuli. Inspired by drive reduction theory, it uses Minor Component Analysis (MCA) to model the agent’s internal drive state, and modulates incoming stimuli on the basis of how strongly the stimulus satiates the currently active drive. The agent’s dynamic policy continually changes through least-squares temporal difference updates. It automatically seeks stimuli that first satiate the most active internal drives, then the next most active drives, etc. We prove that our algorithm is stable under certain conditions. Experimental results illustrate its behavior.

[1]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[3]  Martin V. Butz,et al.  Distinction between types of motivations: Emergent behavior with a neural, model-based reinforcement learning system , 2009, 2009 IEEE Symposium on Artificial Life.

[4]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[5]  Zhang Yi,et al.  Convergence analysis of a simple minor component analysis algorithm , 2007, Neural Networks.

[6]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[7]  Lola Cañamero,et al.  Hedonic value: enhancing adaptation for motivated agents , 2013, Adapt. Behav..

[8]  C. L. Hull Principles of behavior : an introduction to behavior theory , 1943 .

[9]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[10]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[11]  Jürgen Schmidhuber,et al.  An intrinsic value system for developing multiple invariant representations with incremental slowness learning , 2013, Front. Neurorobot..

[12]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[13]  J. Wolpe Need-reduction, drive-reduction, and reinforcement; a neurophysiological view. , 1950, Psychological review.

[14]  Wilse B. Webb,et al.  Century Psychology Series , 1966 .

[15]  Boris S. Gutkin,et al.  A Reinforcement Learning Theory for Homeostatic Regulation , 2011, NIPS.

[16]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[17]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[18]  John Hallam,et al.  From Animals to Animats 10 , 2008 .

[19]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[20]  Andrew G. Barto,et al.  An Adaptive Robot Motivational System , 2006, SAB.

[21]  Michael Werman,et al.  An On-Line Agglomerative Clustering Method for Nonstationary Data , 1999, Neural Computation.

[22]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[23]  George Dimitri Konidaris,et al.  An Architecture for Behavior-Based Reinforcement Learning , 2005, Adapt. Behav..