论文信息 - Fast Approximation of Rotations and Hessians matrices

Fast Approximation of Rotations and Hessians matrices

A new method to represent and approximate rotation matrices is introduced. The method represents approximations of a rotation matrix $Q$ with linearithmic complexity, i.e. with $\frac{1}{2}n\lg(n)$ rotations over pairs of coordinates, arranged in an FFT-like fashion. The approximation is "learned" using gradient descent. It allows to represent symmetric matrices $H$ as $QDQ^T$ where $D$ is a diagonal matrix. It can be used to approximate covariance matrix of Gaussian models in order to speed up inference, or to estimate and track the inverse Hessian of an objective function by relating changes in parameters to changes in gradient along the trajectory followed by the optimization procedure. Experiments were conducted to approximate synthetic matrices, covariance matrices of real data, and Hessian matrices of objective functions involved in machine learning problems.

Yann LeCun | Michaël Mathieu | Michaël Mathieu | Yann LeCun

[1] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[2] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .

[3] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[4] A. Genz. Methods for Generating Random Orthogonal Matrices , 2000 .

[5] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[6] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[7] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[8] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .

[9] Charles A. Bouman,et al. The Sparse Matrix Transform for Covariance Estimation and Analysis of High Dimensional Signals , 2011, IEEE Transactions on Image Processing.

[10] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[11] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[12] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.