论文信息 - Learning to Learn for Global Optimization of Black Box Functions

Learning to Learn for Global Optimization of Black Box Functions

We present a learning to learn approach for training recurrent neural networks to perform black-box global optimization. In the meta-learning phase we use a large set of smooth target functions to learn a recurrent neural network (RNN) optimizer, which is either a long-short term memory network or a differentiable neural computer. After learning, the RNN can be applied to learn policies in reinforcement learning, as well as other black-box learning tasks, including continuous correlated bandits and experimental design. We compare this approach to Bayesian optimization, with emphasis on the issues of computation speed, horizon length, and exploration-exploitation trade-offs.

[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2] Lewis B. Ward. Reminiscence and rote learning. , 1937 .

[3] H. Harlow,et al. The formation of learning sets. , 1949, Psychological review.

[4] Harold J. Kushner,et al. A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[5] E. Kehoe. A layered network model of associative learning: learning to learn and configuration. , 1988, Psychological review.

[6] J. Mockus,et al. The Bayesian approach to global optimization , 1989 .

[7] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[8] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[9] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[10] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[11] J. Schmidhuber,et al. A neural network that embeds its own meta-levels , 1993, IEEE International Conference on Neural Networks.

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[14] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[15] Magnus Thor Jonsson,et al. Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[16] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[17] Donald R. Jones,et al. A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[18] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19] Samy Bengio,et al. On the search for new learning rules for ANNs , 1995, Neural Processing Letters.

[20] Katherine D. Kinzler,et al. Core knowledge. , 2007, Developmental science.

[21] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[22] Ron Kohavi,et al. Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[23] Eric Walter,et al. An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[24] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[25] Nando de Freitas,et al. New inference strategies for solving Markov Decision Processes using reversible jump MCMC , 2009, UAI.

[26] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[27] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.