Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods

We develop theoretical foundations of resonator networks, a new type of recurrent neural network introduced in Frady, Kent, Olshausen, and Sommer (2020), a companion article in this issue, to solve a high-dimensional vector factorization problem arising in Vector Symbolic Architectures. Given a composite vector formed by the Hadamard product between a discrete set of high-dimensional vectors, a resonator network can efficiently decompose the composite into these factors. We compare the performance of resonator networks against optimization-based methods, including Alternating Least Squares and several gradient-based algorithms, showing that resonator networks are superior in several important ways. This advantage is achieved by leveraging a combination of nonlinear dynamics and searching in superposition, by which estimates of the correct solution are formed from a weighted superposition of all possible solutions. While the alternative methods also search in superposition, the dynamics of resonator networks allow them to strike a more effective balance between exploring the solution space and exploiting local information to drive the network toward probable solutions. Resonator networks are not guaranteed to converge, but within a particular regime they almost always do. In exchange for relaxing the guarantee of global convergence, resonator networks are dramatically more effective at finding factorizations than all alternative approaches considered.

[1]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[2]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[3]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[4]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[5]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[6]  Shun-ichi Amari,et al.  Statistical neurodynamics of associative memory , 1988, Neural Networks.

[7]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[8]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[9]  Tony A. Plate,et al.  Holographic Reduced Representation: Distributed Representation for Cognitive Structures , 2003 .

[10]  J. Kruskal,et al.  Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters , 1980 .

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  CondatLaurent Fast projection onto the simplex and the $$\pmb {l}_\mathbf {1}$$l1 ball , 2016 .

[13]  K. Bredies,et al.  Linear Convergence of Iterative Soft-Thresholding , 2007, 0709.1598.

[14]  Masaki Kobayashi,et al.  Multidirectional associative memory with a hidden layer , 2002, Systems and Computers in Japan.

[15]  Ross W. Gayler Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience , 2004, ArXiv.

[16]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[17]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[18]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[19]  Laurent Condat,et al.  A Fast Projection onto the Simplex and the l 1 Ball , 2015 .

[20]  D. W. Arathorn Recognition under transformation using superposition ordering property , 2001 .

[21]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[22]  Personnaz,et al.  Collective computational properties of neural networks: New learning mechanisms. , 1986, Physical review. A, General physics.

[23]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[24]  Chandan Dasgupta,et al.  Retrieval Properties of a Hopfield Model with Random Asymmetric Interactions , 2000, Neural Computation.

[25]  Pentti Kanerva,et al.  Binary Spatter-Coding of Ordered K-Tuples , 1996, ICANN.

[26]  H. Sompolinsky,et al.  Chaos in Neuronal Networks with Balanced Excitatory and Inhibitory Activity , 1996, Science.

[27]  VandewalleJoos,et al.  On the Best Rank-1 and Rank-(R1,R2,. . .,RN) Approximation of Higher-Order Tensors , 2000 .

[28]  Bruno A. Olshausen,et al.  Resonator Networks, 1: An Efficient Solution for Factoring High-Dimensional, Distributed Representations of Data Structures , 2020, Neural Computation.

[29]  Jan M. Rabaey,et al.  High-Dimensional Computing as a Nanoscalable Paradigm , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[31]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[32]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[33]  BART KOSKO,et al.  Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..

[34]  Sompolinsky,et al.  Information storage in neural networks with low levels of activity. , 1987, Physical review. A, General physics.

[35]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[36]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[37]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[38]  Friedrich T. Sommer,et al.  Robust computation with rhythmic spike patterns , 2019, Proceedings of the National Academy of Sciences.

[39]  Stephen Grossberg,et al.  Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[41]  S. Solla,et al.  Memory networks with asymmetric bonds , 1987 .

[42]  Laurent Condat Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[43]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[44]  Zongben Xu,et al.  Asymmetric Hopfield-type networks: Theory and applications , 1996, Neural Networks.

[45]  Sompolinsky,et al.  Storing infinite numbers of patterns in a spin-glass model of neural networks. , 1985, Physical review letters.

[46]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[47]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[48]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[49]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[50]  Tomás Gedeon,et al.  Analysis of Constrained Optimization Variants of the Map-Seeking Circuit Algorithm , 2007, Journal of Mathematical Imaging and Vision.

[51]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[52]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[53]  Tomás Gedeon,et al.  Convergence of Map Seeking Circuits , 2007, Journal of Mathematical Imaging and Vision.

[54]  David W. Arathorn,et al.  Map-Seeking Circuits in Visual Cognition: A Computational Mechanism for Biological and Machine Vision , 2002 .

[55]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[56]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[57]  Chengxiang,et al.  Fixed points in a Hopfield model with random asymmetric interactions. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[58]  Ross W. Gayler,et al.  Multiplicative Binding, Representation Operators & Analogy , 1998 .

[59]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..