Machine Learning for motor skills in robotics
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Robert E. Kalaba,et al. Dynamic Programming and Modern Control Theory , 1966 .
[3] V. Hutson. Integral Equations , 1967, Nature.
[4] Richard Bellman,et al. Introduction to the mathematical theory of control processes , 1967 .
[5] N. Minamide. Minimum error control problem in Banach space , 1969 .
[6] L. Goddard,et al. Operations Research (OR) , 2007 .
[7] L. Weiss. Introduction to the mathematical theory of control processes, Vol. I - Linear equations and quadratic criteria , 1970 .
[8] Stephen R. McReynolds,et al. The computation and theory of optimal control , 1970 .
[9] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[10] W. Müller. JACOBSON, D. H. and D. Q. MAYNE: Differential dynamic programming. Modern analytic and computational methods in Science and Mathematics, No. 24. American Elsevier Publ. Co., Inc., New York 1970. XVI, 208 S., 17 Abb., Dfl. 51.50. , 1973 .
[11] L. Hasdorff. Gradient Optimization and Nonlinear Control , 1976 .
[12] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[13] Michael A. Arbib,et al. Perceptual Structures and Distributed Motor Control , 1981 .
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] Russell H. Taylor,et al. Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .
[16] Mark W. Spong,et al. ON POINTWISE OPTIMAL CONTROL STRATEGIES FOR ROBOT MANIPULATORS. , 1984 .
[17] A. A. Maciejewski,et al. Obstacle Avoidance , 2005 .
[18] A. Isidori. Nonlinear Control Systems , 1985 .
[19] Mark W. Spong,et al. The control of robot manipulators with bounded input , 1986 .
[20] T. Yoshikawa,et al. Task-Priority Based Redundancy Control of Robot Manipulators , 1987 .
[21] Oussama Khatib,et al. A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..
[22] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[23] P. Ramadge,et al. Supervisory control of a class of discrete event processes , 1987 .
[24] John M. Hollerbach,et al. Local versus global torque optimization of redundant manipulators , 1987, Proceedings. 1987 IEEE International Conference on Robotics and Automation.
[25] John M. Hollerbach,et al. Redundancy resolution of manipulators through torque optimization , 1987, IEEE J. Robotics Autom..
[26] R. Fletcher. Practical Methods of Optimization , 1988 .
[27] Zhaoyu Wang,et al. Global versus Local Optimization in Redundancy Resolution of Robotic Manipulators , 1988, Int. J. Robotics Res..
[28] A. Guez,et al. Solution to the inverse kinematics problem in robotics by neural networks , 1988, IEEE 1988 International Conference on Neural Networks.
[29] Christopher G. Atkeson,et al. Model-Based Control of a Robot Manipulator , 1988 .
[30] Hossein Arsham,et al. Sensitivity analysis and the “what if” problem in simulation analysis , 1989 .
[31] Peter W. Glynn,et al. Optimization Of Stochastic Systems Via Simulation , 1989, 1989 Winter Simulation Conference Proceedings.
[32] S. Shankar Sastry,et al. Dynamic control of redundant manipulators , 1989, J. Field Robotics.
[33] Yoshihiko Nakamura,et al. Advanced robotics - redundancy and optimization , 1990 .
[34] Tsuneo Yoshikawa,et al. Foundations of Robotics: Analysis and Control , 1990 .
[35] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[36] Jean-Jacques E. Slotine,et al. A general framework for managing multiple tasks in highly redundant robotic systems , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.
[37] Alessandro De Luca,et al. Learning control for redundant manipulators , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.
[38] Claude Samson,et al. Robot Control: The Task Function Approach , 1991 .
[39] Oliver G. Selfridge,et al. Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[40] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[41] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[42] Keith L. Doty,et al. A Theory of Generalized Inverses Applied to Robotics , 1993, Int. J. Robotics Res..
[43] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[44] Dana H. Ballard,et al. Recognizing teleoperated manipulations , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.
[45] M. Kawato,et al. Trajectory formation of arm movement by a neural network with forward and inverse dynamics models , 1993 .
[46] S. Grossberg,et al. A Self-Organizing Neural Model of Motor Equivalent Reaching and Tool Use by a Multijoint Arm , 1993, Journal of Cognitive Neuroscience.
[47] Won Jee Chung,et al. Null torque-based dynamic control for kinematically redundant manipulators , 1993, J. Field Robotics.
[48] Mitsuo Kawato,et al. Teaching by Showing in Kendama Based on Optimization Principle , 1994 .
[49] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[50] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[51] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[52] Vijaykumar Gullapalli,et al. Skillful control under uncertainty via direct reinforcement learning , 1995, Robotics Auton. Syst..
[53] Bruno Siciliano,et al. Modeling and Control of Robot Manipulators , 1995 .
[54] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.
[55] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .
[56] M. Branicky,et al. Algorithms for optimal hybrid control , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[57] Stefan Schaal,et al. A Kendama learning robot based on a dynamic optimization theory , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.
[58] Nancy A. Lynch,et al. Hybrid I/O automata , 1995, Inf. Comput..
[59] Thomas A. Henzinger,et al. Linear Phase-Portrait Approximations for Nonlinear Hybrid Systems , 1996, Hybrid Systems.
[60] Jonghoon Park,et al. Specification and control of motion for kinematically redundant manipulators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.
[61] R. Kalaba,et al. Analytical Dynamics: A New Approach , 1996 .
[62] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[63] Suguru Arimoto,et al. Control Theory of Nonlinear Mechanical Systems , 1996 .
[64] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.
[65] Stephen S. Wilson,et al. Random iterative models , 1996 .
[66] Carlos Canudas de Wit,et al. Theory of Robot Control , 1996 .
[67] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[68] Thomas A. Henzinger,et al. The theory of hybrid automata , 1996, Proceedings 11th Annual IEEE Symposium on Logic in Computer Science.
[69] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[70] Yuan F. Zheng,et al. Reinforcement learning for a biped robot to climb sloping surfaces , 1997, J. Field Robotics.
[71] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[72] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[73] J. Spall,et al. Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).
[74] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[75] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[76] Vijay Balasubramanian,et al. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.
[77] Anders Rantzer,et al. Computation of piecewise quadratic Lyapunov functions for hybrid systems , 1997, 1997 European Control Conference (ECC).
[78] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .
[79] Mitsuo Kawato,et al. A tennis serve and upswing learning robot based on bi-directional theory , 1998, Neural Networks.
[80] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[81] Mitsuo Kawato,et al. Multiple Paired Forward-Inverse Models for Human Motor Learning and Control , 1998, NIPS.
[82] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[83] Jerry E. Pratt,et al. Intuitive control of a planar bipedal walking robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).
[84] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[85] P. Caines,et al. Hierarchical hybrid control systems: a lattice theoretic formulation , 1998, IEEE Trans. Autom. Control..
[86] Randal W. Bea. Successive Galerkin approximation algorithms for nonlinear optimal and robust control , 1998 .
[87] M. Branicky. Multiple Lyapunov functions and other analysis tools for switched and hybrid systems , 1998, IEEE Trans. Autom. Control..
[88] D M Wolpert,et al. Multiple paired forward and inverse models for motor control , 1998, Neural Networks.
[89] Daniel E. Koditschek,et al. Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..
[90] Mitsuo Kawato,et al. Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.
[91] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[92] J. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes: implementation issues , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[93] S. Schaal,et al. Segmentation of endpoint trajectories does not imply segmented control , 1999, Experimental Brain Research.
[94] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[95] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[96] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[97] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .
[98] J. Spall,et al. Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .
[99] D. Boussaoud,et al. Gaze effects in the cerebral cortex: reference frames for space coding and action , 1999, Experimental Brain Research.
[100] Michael D. Lemmon,et al. Supervisory hybrid systems , 1999 .
[101] K. Edström. Switched Bond Graphs : Simulation and Analysis , 1999 .
[102] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[103] A. Wing,et al. Motor control: Mechanisms of motor equivalence in handwriting , 2000, Current Biology.
[104] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[105] Changjiu Zhou,et al. Reinforcement learning with fuzzy evaluative feedback for a biped robot , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).
[106] Martin Buss,et al. Towards Hybrid Optimal Control , 2000 .
[107] William Leithead,et al. Survey of gain-scheduling analysis and design , 2000 .
[108] Oskar von Stryk,et al. Towards optimal hybrid control solutions for gait patterns of a quadruped , 2000 .
[109] O. V. Stryk,et al. Decomposition of Mixed-Integer Optimal Control Problems Using Branch and Bound and Sparse Direct Collocation , 2000 .
[110] Oussama Khatib,et al. Gauss' principle and the dynamics of redundant and constrained manipulators , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).
[111] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[112] Stefan Schaal,et al. Inverse kinematics for humanoid robots , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).
[113] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[114] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[115] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[116] Stefan Schaal,et al. Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).
[117] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[118] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[119] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[120] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..
[121] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[122] A. Berny,et al. Statistical machine learning and combinatorial optimization , 2001 .
[123] Richard S. Sutton,et al. Comparing Policy-Gradient Algorithms , 2001 .
[124] Andrew G. Barto,et al. Lyapunov-Constrained Action Sets for Reinforcement Learning , 2001, ICML.
[125] Jun Nakanishi,et al. Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).
[126] Aude Billard,et al. Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..
[127] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.
[128] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.
[129] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.
[130] Peter L. Bartlett,et al. An Introduction to Reinforcement Learning Theory: Value Function Methods , 2002, Machine Learning Summer School.
[131] Jessica K. Hodgins,et al. Generalizing Demonstrated Manipulation Tasks , 2002, WAFR.
[132] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[133] A. Albu-Schäffer. Regelung von Robotern mit elastischen Gelenken am Beispiel der DLR-Leichtbauarme , 2002 .
[134] Jonghoon Park,et al. Characterization of instability of dynamic control for kinematically redundant manipulators , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[135] R. Amit,et al. Learning movement sequences from demonstration , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.
[136] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[137] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
[138] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[139] Alin Albu-Schäffer,et al. DLR's torque-controlled light weight robot III-are we reaching the technological limits now? , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[140] Stefan Schaal,et al. Computational elements of robot learning by imitation , 2002 .
[141] Yoshihiko Nakamura,et al. Acquisition and embodiment of motion elements in closed mimesis loop , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[142] S. Żak. Systems and control , 2002 .
[143] Panos J. Antsaklis,et al. An approach to optimal control of switched systems with internally forced switchings , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).
[144] Jun Morimoto,et al. Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.
[145] K. Dautenhahn,et al. Imitation in Animals and Artifacts , 2002 .
[146] Jun Morimoto,et al. Robust low-torque biped walking using differential dynamic programming with a minimax criterion , 2002 .
[147] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[148] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[149] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.
[150] Jun Nakanishi,et al. Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .
[151] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[152] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[153] Stefan Schaal,et al. http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .
[154] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.
[155] Mitsuo Kawato,et al. Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.
[156] Sethu Vijayakumar,et al. Scaling Reinforcement Learning Paradigms for Motor Learning , 2003 .
[157] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[158] Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
[159] Katsu Yamane,et al. Natural Motion Animation through Constraining and Deconstraining at Will , 2003, IEEE Trans. Vis. Comput. Graph..
[160] F. Udwadia. A new perspective on the tracking control of nonlinear structural and mechanical systems , 2003, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.
[161] O. Khatib. TASK-ORIENTED CONTROL OF HUMANOID ROBOTS THROUGH PRIORITIZATION , 2004 .
[162] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[163] Peggy Fidelman,et al. Learning Ball Acquisition on a Physical Robot , 2004 .
[164] Shin Ishii,et al. Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.
[165] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.
[166] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[167] Nancy S. Pollard,et al. Closure and Quality Equivalence for Efficient Synthesis of Grasps from Examples , 2004, Int. J. Robotics Res..
[168] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[169] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[170] Stefan Schaal,et al. Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.
[171] Jun Tani,et al. Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.
[172] Jun Nakanishi,et al. Learning composite adaptive control for a class of nonlinear systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[173] Shin Ishii,et al. Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.
[174] Yoshihiko Nakamura,et al. Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..
[175] Oussama Khatib,et al. Whole-Body Dynamic Behavior and Control of Human-like Robots , 2004, Int. J. Humanoid Robotics.
[176] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..
[177] Mitsuo Kawato,et al. A theory for cursive handwriting based on the minimization principle , 1995, Biological Cybernetics.
[178] Oussama Khatib,et al. Prioritized multi-objective dynamics and control of robots in human environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..
[179] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .
[180] Jun Nakanishi,et al. A unifying methodology for the control of robotic systems , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[181] Oussama Khatib,et al. Control of Free-Floating Humanoid Robots Through Task Prioritization , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.
[182] Takayuki Kanda,et al. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[183] A. D. Lewis,et al. Geometric control of mechanical systems : modeling, analysis, and design for simple mechanical control systems , 2005 .
[184] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[185] Stefan Schaal,et al. A New Methodology for Robot Controller Design , 2005 .
[186] Jun Nakanishi,et al. Comparative experiments on task space control with redundancy resolution , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[187] Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.
[188] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
[189] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[190] Douglas Aberdeen,et al. POMDPs and Policy Gradients , 2006 .
[191] Stefan Schaal,et al. Learning Operational Space Control , 2006, Robotics: Science and Systems.
[192] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[193] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[194] Shin Ishii,et al. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[195] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.
[196] J. Peters,et al. Using Reward-weighted Regression for Reinforcement Learning of Task Space Control , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[197] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .
[198] Stefan Schaal,et al. Reinforcement Learning for Operational Space Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[199] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .