projUNN: efficient method for training deep networks with unitary matrices

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full $N$-dimensional unitary or orthogonal matrices with a training runtime scaling as $O(kN^2)$. Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting ($k=1$), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds benchmarked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.

[1]  Albert Gu,et al.  Efficiently Modeling Long Sequences with Structured State Spaces , 2021, ICLR.

[2]  A. Dienes,et al.  Implicit Bias of Linear Equivariant Networks , 2021, ICML.

[3]  Quynh T. Nguyen,et al.  Quantum algorithms for group convolution, cross-correlation, and equivariant transformations , 2021, Physical Review A.

[4]  Sahil Singla,et al.  Skew Orthogonal Convolutions , 2021, ICML.

[5]  J. Z. Kolter,et al.  Orthogonalizing Convolutional Layers with the Cayley Transform , 2021, ICLR.

[6]  M. Cerezo,et al.  Variational quantum algorithms , 2020, Nature Reviews Physics.

[7]  Omri Azencot,et al.  Lipschitz Recurrent Neural Networks , 2020, ICLR.

[8]  James M. Rehg,et al.  Orthogonal Over-Parameterized Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dacheng Tao,et al.  Orthogonal Deep Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael W. Mahoney,et al.  Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..

[11]  Eric R. Anschuetz Critical Points in Hamiltonian Agnostic Variational Quantum Algorithms , 2021 .

[12]  Seth Lloyd,et al.  Quantum Earth Mover's Distance: A New Approach to Learning Quantum Data , 2021, ArXiv.

[13]  C. Ré,et al.  HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.

[14]  Sridhar Swaminathan,et al.  Sparse low rank factorization for deep neural network compression , 2020, Neurocomputing.

[15]  Rainer Engelken,et al.  Lyapunov spectra of chaotic recurrent neural networks , 2020, Physical Review Research.

[16]  Jakub M. Tomczak,et al.  The Convolution Exponential and Generalized Sylvester Flows , 2020, NeurIPS.

[17]  Ling Shao,et al.  Controllable Orthogonalization in Training DNNs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ievgeniia Oshurko Quantum Machine Learning , 2020, Quantum Computing.

[19]  T. Osborne,et al.  Training deep quantum neural networks , 2020, Nature Communications.

[20]  Jun Li,et al.  Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform , 2020, ICLR.

[21]  Seth Lloyd,et al.  Learning Unitaries by Gradient Descent , 2020, ArXiv.

[22]  Akira Sone,et al.  Cost-Function-Dependent Barren Plateaus in Shallow Quantum Neural Networks , 2020, ArXiv.

[23]  Stella X. Yu,et al.  Orthogonal Convolutional Neural Networks , 2019, Computer Vision and Pattern Recognition.

[24]  A. Prakash,et al.  Quantum Algorithms for Deep Convolutional Neural Networks , 2019, ICLR.

[25]  Seth Lloyd,et al.  Quantum-inspired algorithms in practice , 2019, Quantum.

[26]  Iordanis Kerenidis,et al.  Quantum Algorithms for Feedforward Neural Networks , 2018, ACM Transactions on Quantum Computing.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Cem Anil,et al.  Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks , 2019, NeurIPS.

[29]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[30]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[31]  Joan Bruna,et al.  Approximating Orthogonal Matrices with Effective Givens Factorization , 2019, ICML.

[32]  Martin Jaggi,et al.  PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization , 2019, NeurIPS.

[33]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[34]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[35]  Ying Li,et al.  Theory of variational quantum simulation , 2018, Quantum.

[36]  Seth Lloyd,et al.  Continuous-variable quantum neural networks , 2018, Physical Review Research.

[37]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[38]  S. Brierley,et al.  Accelerated Variational Quantum Eigensolver. , 2018, Physical review letters.

[39]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[40]  Jascha Sohl-Dickstein,et al.  Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[41]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[42]  H. Neven,et al.  Barren plateaus in quantum neural network training landscapes , 2018, Nature Communications.

[43]  Walter Vinci,et al.  Quantum variational autoencoder , 2018, Quantum Science and Technology.

[44]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[45]  Hongyang Zhang,et al.  Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.

[46]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[47]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[48]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[49]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[51]  James Bailey,et al.  Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections , 2016, ICML.

[52]  Gunnar Rätsch,et al.  Learning Unitary Operators with Help From u(n) , 2016, AAAI.

[53]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[54]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[55]  Justin K. Romberg,et al.  An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[58]  Roberto Cipolla,et al.  Training CNNs with Low-Rank Filters for Efficient Image Classification , 2015, ICLR.

[59]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[60]  Juha Karhunen,et al.  Bidirectional Recurrent Neural Networks as Generative Models , 2015, NIPS.

[61]  M. Hastings,et al.  Progress towards practical quantum variational algorithms , 2015, 1507.08969.

[62]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[63]  Maria Schuld,et al.  The quest for a Quantum Neural Network , 2014, Quantum Information Processing.

[64]  Daniel Jurafsky,et al.  First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs , 2014, ArXiv.

[65]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[66]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[67]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[68]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[69]  Sophie Schirmer,et al.  A CLOSER LOOK AT QUANTUM CONTROL LANDSCAPES AND THEIR IMPLICATION FOR CONTROL OPTIMIZATION , 2013 .

[70]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[71]  Yi-Kai Liu,et al.  Universal low-rank matrix recovery from Pauli measurements , 2011, NIPS.

[72]  Steve Mullett,et al.  Read the fine print. , 2009, RN.

[73]  Alexander Kirillov,et al.  An Introduction to Lie Groups and Lie Algebras , 2008 .

[74]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix , 2006, SIAM J. Comput..

[75]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[76]  N. Higham The Scaling and Squaring Method for the Matrix Exponential Revisited , 2005, SIAM J. Matrix Anal. Appl..

[77]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[78]  Juergen Luettin,et al.  Fast Face Detection using MLP and FFT , 1999 .

[79]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[80]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[81]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[82]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[83]  J. Kautsky,et al.  A Matrix Approach to Discrete Wavelets , 1994 .

[84]  J. Keller Closest Unitary, Orthogonal and Hermitian Operators to a Given Operator , 1975 .