Policy Search with High-Dimensional Context Variables

Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.

[1]  Peder A. Olsen,et al.  Nuclear Norm Minimization via Active Subspace Selection , 2014, ICML.

[2]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[3]  Stefan Schaal,et al.  Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[4]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[5]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[6]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[7]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[8]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9]  Andrew B. Nobel,et al.  Supervised singular value decomposition and its asymptotic properties , 2016, J. Multivar. Anal..

[10]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[11]  Mehmet Gönen,et al.  Bayesian Supervised Dimensionality Reduction , 2013, IEEE Transactions on Cybernetics.

[12]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[13]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[14]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[15]  Masashi Sugiyama,et al.  Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.

[16]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[17]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[21]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[22]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[25]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[26]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[27]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[28]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[29]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[30]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..