论文信息 - Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations

Artificial Development by Reinforcement Learning Can Benefit From Multiple Motivations

Research on artificial development, reinforcement learning, and intrinsic motivations like curiosity could profit from the recently developed framework of multi-objective reinforcement learning. The combination of these ideas may lead to more realistic artificial models for life-long learning and goal directed behavior in animals and humans.

Günther Palm | Friedhelm Schwenker | G. Palm | F. Schwenker

[1] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[3] Joseph A. Paradiso,et al. The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[4] G. Palm. Novelty, Information and Surprise , 2012, Information Science and Statistics.

[5] N. Chater. Rational and mechanistic perspectives on reinforcement learning , 2009, Cognition.

[6] Joscha Bach,et al. A Framework for Emergent Emotions, Based on Motivation and Cognitive Modulators , 2012, Int. J. Synth. Emot..

[7] Friedhelm Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.

[8] Matthew E. Taylor,et al. Multi-objectivization and ensembles of shapings in reinforcement learning , 2017, Neurocomputing.

[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10] Dewen Hu,et al. Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[11] Hisashi Handa. Solving Multi-objective Reinforcement Learning Problems by EDA-RL - Acquisition of Various Strategies , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[12] George G. Lendaris,et al. A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.

[13] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[14] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[15] J. Michael Herrmann,et al. Learning predictive representations , 2000, Neurocomputing.

[16] M.A. Wiering,et al. Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[17] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[18] Ah Chung Tsoi,et al. A fully recursive perceptron network architecture , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[19] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[20] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[22] Günther Palm,et al. A neural framework for adaptive robot control , 2009, Neural Computing and Applications.

[23] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[24] Peter Dayan,et al. Exploration bonuses and dual control , 1996 .

[25] Kaisa Miettinen,et al. Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[26] Andrea Castelletti,et al. Reinforcement learning in the operational management of a water system , 2002 .

[27] Günther Palm,et al. Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks , 2012, SAB.

[28] Petia Koprinkova-Hristova,et al. Heuristic dynamic programming using echo state network as online trainable adaptive critic , 2012 .

[29] Marco Wiering,et al. Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[30] Günther Palm,et al. Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces , 2013, PSL.

[31] R. Selten,et al. Bounded rationality: The adaptive toolbox , 2000 .

[32] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[33] Silvana M. B. Afonso,et al. A modified NBI and NC method for the solution of N-multiobjective optimization problems , 2012 .

[34] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[35] T. Maia. Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[36] John E. Dennis,et al. Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems , 1998, SIAM J. Optim..

[37] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[38] Colin Camerer,et al. Introduction : A Brief History of Neuroeconomics , 2008 .

[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[40] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[41] Colin Camerer,et al. Introduction: A Brief History of Neuroeconomics , 2009 .

[42] Henry Markram,et al. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[43] S. Schaal,et al. Computational motor control in humans and robots , 2005, Current Opinion in Neurobiology.

[44] Marco Wiering,et al. Special issue on multi-objective reinforcement learning , 2017, Neurocomputing.

[45] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[46] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[47] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[48] Günther Palm,et al. Adaptive Critic Design with ESN Critic for Bioprocess Optimization , 2010, ICANN.

[49] Douglas C. Hittle,et al. Robust reinforcement learning control , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[50] Wojciech Pisula. Curiosity and Information Seeking in Animal and Human Behavior , 2009 .

[51] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[52] M. Farries,et al. Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.

[53] Jürgen Schmidhuber,et al. Exploring the predictable , 2003 .

[54] Friedrich T. Sommer,et al. Learning in embodied action-perception loops through exploration , 2011, ArXiv.

[55] Günther Palm,et al. Real-Time Emotion Recognition from Speech Using Echo State Networks , 2008, ANNPR.

[56] D. Kahneman. Maps of Bounded Rationality: Psychology for Behavioral Economics , 2003 .

[57] Olaf Sporns,et al. Information-Theoretical Aspects of Embodied Artificial Intelligence , 2003, Embodied Artificial Intelligence.

[58] Günther Palm,et al. Adaptive Exploration Using Stochastic Neurons , 2012, ICANN.

[59] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[60] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.

[61] Silvia London. BOUNDED RATIONALITY AND ECONOMIC EVOLUTION , 2000 .

[62] S Ruzikal,et al. SUCCESSIVE APPROACH TO COMPUTE THE BOUNDED PARETO FRONT OF PRACTICAL MULTIOBJECTIVE OPTIMIZATION PROBLEMS , 2009 .

[63] Steve W. C. Chang,et al. Social learning through prediction error in the brain , 2017, npj Science of Learning.

[64] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[65] Kimberly S. Chiew,et al. Positive Affect Versus Reward: Emotional and Motivational Influences on Cognitive Control , 2011, Front. Psychology.

[66] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .

[67] Jan Peters,et al. Manifold-based multi-objective policy search with sample reuse , 2017, Neurocomputing.

[68] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[69] F. Klix,et al. Bauplan für eine Seele. , 2001 .

[70] Peter Vamplew,et al. Steering approaches to Pareto-optimal multiobjective reinforcement learning , 2017, Neurocomputing.

[71] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[72] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .

[73] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[74] H. Simon,et al. A Behavioral Model of Rational Choice , 1955 .

[75] Andrew M. Wikenheiser,et al. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex , 2016, Nature Reviews Neuroscience.

[76] Giulio Sandini,et al. Developmental robotics: a survey , 2003, Connect. Sci..

[77] Kenji Doya,et al. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning , 2008, Neural Networks.

[78] H. Simon. Bounded Rationality and Organizational Learning , 1991 .

[79] Shimon Whiteson. Pareto Local Policy Search for MOMDP Planning , 2015 .

[80] José Carlos Príncipe,et al. Balancing exploration and exploitation in reinforcement learning using a value of information criterion , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[81] E. Todorov. Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[82] Peter Dayan,et al. Values and Actions in Aversion , 2009 .

[83] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[84] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[85] Uli Weidma. Vergleichende Verhaltensforschung: Grundlagen der Ethologie , 1980 .

[86] Murray S. Davis,et al. That's Interesting! , 1971 .

[87] Petia Koprinkova-Hristova,et al. Learning of embodied interaction dynamics with recurrent neural networks: some exploratory experiments , 2014, Journal of neural engineering.

[88] Manuela Ruiz-Montiel. Multi-objective Reinforcement Learning , 2013 .

[89] Terrence J. Sejnowski,et al. Exploration Bonuses and Dual Control , 1996, Machine Learning.

[90] G. Palm,et al. Multiobjective Reinforcement Learning Using Adaptive Dynamic Programming And Reservoir Computing , 2013 .

[91] Marcello Restelli,et al. Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.

[92] Wee Chin Wong,et al. A reinforcement learning‐based scheme for direct adaptive optimal control of linear stochastic systems , 2010 .

[93] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .