Hybrid control trajectory optimization under uncertainty

Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

[1]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[2]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[3]  Kazuo Tanie,et al.  Manipulation And Active Sensing By Pushing Using Tactile Feedback , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  L. Liao,et al.  Advantages of Differential Dynamic Programming Over Newton''s Method for Discrete-time Optimal Control Problems , 1992 .

[5]  Oskar von Stryk,et al.  Direct and indirect methods for trajectory optimization , 1992, Ann. Oper. Res..

[6]  V. Borkar,et al.  A unified framework for hybrid control: model and optimal control theory , 1998, IEEE Trans. Autom. Control..

[7]  C. Iung,et al.  Linear quadratic optimization for hybrid systems , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[8]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[9]  Alberto Bemporad,et al.  Observability and controllability of piecewise affine and hybrid systems , 2000, IEEE Trans. Autom. Control..

[10]  Bo Lincoln,et al.  LQR optimization of linear system switching , 2002, IEEE Trans. Autom. Control..

[11]  Michael S. Branicky,et al.  Nonlinear and Hybrid Control Via RRTs , 2002 .

[12]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[13]  Magnus Egerstedt,et al.  Output-Based Optimal Timing Control of Switched Systems , 2006, HSCC.

[14]  Sebastian Sager,et al.  Numerical methods for mixed-integer optimal control problems , 2006 .

[15]  Y. Wardi,et al.  Optimal timing control of switched linear systems based on partial information , 2006 .

[16]  Naresh N. Nandola,et al.  A multiple model approach for predictive control of nonlinear hybrid systems , 2008 .

[17]  Lorenz T. Biegler,et al.  Large scale optimization strategies for zone configuration of simulated moving beds , 2008, Comput. Chem. Eng..

[18]  Joelle Pineau,et al.  Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[19]  M. Egerstedt,et al.  Hybrid LQ-optimization using dynamic programming , 2009, 2009 American Control Conference.

[20]  Brahim Chaib-draa,et al.  Bayesian reinforcement learning in continuous POMDPs with gaussian processes , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[22]  H. Bock,et al.  Time‐optimal control of automobile test drives with gear shifts , 2010 .

[23]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  S. Srinivasa,et al.  Push-grasping with dexterous hands: Mechanics and a method , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Ron Alterovitz,et al.  Efficient Approximate Value Iteration for Continuous Gaussian POMDPs , 2012, AAAI.

[26]  Siddhartha S. Srinivasa,et al.  A Planning Framework for Non-Prehensile Manipulation under Clutter and Uncertainty , 2012, Autonomous Robots.

[27]  Claudio Zito,et al.  Two-level RRT planning for robotic push manipulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  David Hsu,et al.  Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[29]  Robin Deits,et al.  Footstep planning on uneven terrain with mixed-integer convex optimization , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[30]  David Hsu,et al.  Integrated perception and planning in the continuous space: A POMDP approach , 2013, Int. J. Robotics Res..

[31]  Jur P. van den Berg,et al.  Online parameter estimation via real-time replanning of continuous Gaussian POMDPs , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Pieter Abbeel,et al.  Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation , 2014, WAFR.

[33]  Nancy M. Amato,et al.  Robust online belief space planning in changing environments: Application to physical mobile robots , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Siddhartha S. Srinivasa,et al.  Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty , 2014, Int. J. Robotics Res..

[35]  Nancy M. Amato,et al.  FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements , 2014, Int. J. Robotics Res..

[36]  Feng Zhu,et al.  Optimal control of hybrid switched systems: A brief survey , 2015, Discret. Event Dyn. Syst..

[37]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[38]  Christopher G. Atkeson,et al.  Differential dynamic programming for graph-structured dynamical systems: Generalization of pouring behavior with different skills , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[39]  Rustam Stolkin,et al.  Learning modular and transferable forward models of the motions of push manipulated objects , 2017, Auton. Robots.