Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations

Research on artificial development, reinforcement learning, and intrinsic motivations like curiosity could profit from the recently developed framework of multi-objective reinforcement learning. The combination of these ideas may lead to more realistic artificial models for life-long learning and goal directed behavior in animals and humans.

[1]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[2]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[3]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[4]  G. Palm Novelty, Information and Surprise , 2012, Information Science and Statistics.

[5]  N. Chater Rational and mechanistic perspectives on reinforcement learning , 2009, Cognition.

[6]  Joscha Bach,et al.  A Framework for Emergent Emotions, Based on Motivation and Cognitive Modulators , 2012, Int. J. Synth. Emot..

[7]  Friedhelm Schwenker,et al.  Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.

[8]  Matthew E. Taylor,et al.  Multi-objectivization and ensembles of shapings in reinforcement learning , 2017, Neurocomputing.

[9]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[11]  Hisashi Handa Solving Multi-objective Reinforcement Learning Problems by EDA-RL - Acquisition of Various Strategies , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[12]  George G. Lendaris,et al.  A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.

[13]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[14]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[15]  J. Michael Herrmann,et al.  Learning predictive representations , 2000, Neurocomputing.

[16]  M.A. Wiering,et al.  Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[17]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[18]  Ah Chung Tsoi,et al.  A fully recursive perceptron network architecture , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[19]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[20]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[22]  Günther Palm,et al.  A neural framework for adaptive robot control , 2009, Neural Computing and Applications.

[23]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[24]  Peter Dayan,et al.  Exploration bonuses and dual control , 1996 .

[25]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[26]  Andrea Castelletti,et al.  Reinforcement learning in the operational management of a water system , 2002 .

[27]  Günther Palm,et al.  Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks , 2012, SAB.

[28]  Petia Koprinkova-Hristova,et al.  Heuristic dynamic programming using echo state network as online trainable adaptive critic , 2012 .

[29]  Marco Wiering,et al.  Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[30]  Günther Palm,et al.  Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces , 2013, PSL.

[31]  R. Selten,et al.  Bounded rationality: The adaptive toolbox , 2000 .

[32]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[33]  Silvana M. B. Afonso,et al.  A modified NBI and NC method for the solution of N-multiobjective optimization problems , 2012 .

[34]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[35]  T. Maia Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[36]  John E. Dennis,et al.  Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems , 1998, SIAM J. Optim..

[37]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[38]  Colin Camerer,et al.  Introduction : A Brief History of Neuroeconomics , 2008 .

[39]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[41]  Colin Camerer,et al.  Introduction: A Brief History of Neuroeconomics , 2009 .

[42]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[43]  S. Schaal,et al.  Computational motor control in humans and robots , 2005, Current Opinion in Neurobiology.

[44]  Marco Wiering,et al.  Special issue on multi-objective reinforcement learning , 2017, Neurocomputing.

[45]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[46]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[47]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[48]  Günther Palm,et al.  Adaptive Critic Design with ESN Critic for Bioprocess Optimization , 2010, ICANN.

[49]  Douglas C. Hittle,et al.  Robust reinforcement learning control , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[50]  Wojciech Pisula Curiosity and Information Seeking in Animal and Human Behavior , 2009 .

[51]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[52]  M. Farries,et al.  Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[53]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[54]  Friedrich T. Sommer,et al.  Learning in embodied action-perception loops through exploration , 2011, ArXiv.

[55]  Günther Palm,et al.  Real-Time Emotion Recognition from Speech Using Echo State Networks , 2008, ANNPR.

[56]  D. Kahneman Maps of Bounded Rationality: Psychology for Behavioral Economics , 2003 .

[57]  Olaf Sporns,et al.  Information-Theoretical Aspects of Embodied Artificial Intelligence , 2003, Embodied Artificial Intelligence.

[58]  Günther Palm,et al.  Adaptive Exploration Using Stochastic Neurons , 2012, ICANN.

[59]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[60]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[61]  Silvia London BOUNDED RATIONALITY AND ECONOMIC EVOLUTION , 2000 .

[62]  S Ruzikal,et al.  SUCCESSIVE APPROACH TO COMPUTE THE BOUNDED PARETO FRONT OF PRACTICAL MULTIOBJECTIVE OPTIMIZATION PROBLEMS , 2009 .

[63]  Steve W. C. Chang,et al.  Social learning through prediction error in the brain , 2017, npj Science of Learning.

[64]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[65]  Kimberly S. Chiew,et al.  Positive Affect Versus Reward: Emotional and Motivational Influences on Cognitive Control , 2011, Front. Psychology.

[66]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[67]  Jan Peters,et al.  Manifold-based multi-objective policy search with sample reuse , 2017, Neurocomputing.

[68]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[69]  F. Klix,et al.  Bauplan für eine Seele. , 2001 .

[70]  Peter Vamplew,et al.  Steering approaches to Pareto-optimal multiobjective reinforcement learning , 2017, Neurocomputing.

[71]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[72]  A. G. Butkovskiy,et al.  Optimal control of systems , 1966 .

[73]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[74]  H. Simon,et al.  A Behavioral Model of Rational Choice , 1955 .

[75]  Andrew M. Wikenheiser,et al.  Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex , 2016, Nature Reviews Neuroscience.

[76]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..

[77]  Kenji Doya,et al.  Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[78]  H. Simon Bounded Rationality and Organizational Learning , 1991 .

[79]  Shimon Whiteson Pareto Local Policy Search for MOMDP Planning , 2015 .

[80]  José Carlos Príncipe,et al.  Balancing exploration and exploitation in reinforcement learning using a value of information criterion , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[81]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[82]  Peter Dayan,et al.  Values and Actions in Aversion , 2009 .

[83]  Michael H. Bowling,et al.  Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[84]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[85]  Uli Weidma Vergleichende Verhaltensforschung: Grundlagen der Ethologie , 1980 .

[86]  Murray S. Davis,et al.  That's Interesting! , 1971 .

[87]  Petia Koprinkova-Hristova,et al.  Learning of embodied interaction dynamics with recurrent neural networks: some exploratory experiments , 2014, Journal of neural engineering.

[88]  Manuela Ruiz-Montiel Multi-objective Reinforcement Learning , 2013 .

[89]  Terrence J. Sejnowski,et al.  Exploration Bonuses and Dual Control , 1996, Machine Learning.

[90]  G. Palm,et al.  Multiobjective Reinforcement Learning Using Adaptive Dynamic Programming And Reservoir Computing , 2013 .

[91]  Marcello Restelli,et al.  Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.

[92]  Wee Chin Wong,et al.  A reinforcement learning‐based scheme for direct adaptive optimal control of linear stochastic systems , 2010 .

[93]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .