论文信息 - Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods

Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods

We develop theoretical foundations of resonator networks, a new type of recurrent neural network introduced in Frady, Kent, Olshausen, and Sommer (2020), a companion article in this issue, to solve a high-dimensional vector factorization problem arising in Vector Symbolic Architectures. Given a composite vector formed by the Hadamard product between a discrete set of high-dimensional vectors, a resonator network can efficiently decompose the composite into these factors. We compare the performance of resonator networks against optimization-based methods, including Alternating Least Squares and several gradient-based algorithms, showing that resonator networks are superior in several important ways. This advantage is achieved by leveraging a combination of nonlinear dynamics and searching in superposition, by which estimates of the correct solution are formed from a weighted superposition of all possible solutions. While the alternative methods also search in superposition, the dynamics of resonator networks allow them to strike a more effective balance between exploring the solution space and exploiting local information to drive the network toward probable solutions. Resonator networks are not guaranteed to converge, but within a particular regime they almost always do. In exchange for relaxing the guarantee of global convergence, resonator networks are dramatically more effective at finding factorizations than all alternative approaches considered.

[1] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[2] Demetri Terzopoulos,et al. Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[3] J. Chang,et al. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[4] F. L. Hitchcock. The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[5] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[6] Shun-ichi Amari,et al. Statistical neurodynamics of associative memory , 1988, Neural Networks.

[7] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[8] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .

[9] Tony A. Plate,et al. Holographic Reduced Representation: Distributed Representation for Cognitive Structures , 2003 .

[10] J. Kruskal,et al. Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters , 1980 .

[11] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12] CondatLaurent. Fast projection onto the simplex and the $$\pmb {l}_\mathbf {1}$$l1 ball , 2016 .

[13] K. Bredies,et al. Linear Convergence of Iterative Soft-Thresholding , 2007, 0709.1598.

[14] Masaki Kobayashi,et al. Multidirectional associative memory with a hidden layer , 2002, Systems and Computers in Japan.

[15] Ross W. Gayler. Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience , 2004, ArXiv.

[16] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[17] Gábor Lugosi,et al. Concentration Inequalities , 2008, COLT.

[18] L. Tucker,et al. Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[19] Laurent Condat,et al. A Fast Projection onto the Simplex and the l 1 Ball , 2015 .

[20] D. W. Arathorn. Recognition under transformation using superposition ordering property , 2001 .

[21] Joos Vandewalle,et al. A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[22] Personnaz,et al. Collective computational properties of neural networks: New learning mechanisms. , 1986, Physical review. A, General physics.

[23] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[24] Chandan Dasgupta,et al. Retrieval Properties of a Hopfield Model with Random Asymmetric Interactions , 2000, Neural Computation.

[25] Pentti Kanerva,et al. Binary Spatter-Coding of Ordered K-Tuples , 1996, ICANN.

[26] H. Sompolinsky,et al. Chaos in Neuronal Networks with Balanced Excitatory and Inhibitory Activity , 1996, Science.

[27] VandewalleJoos,et al. On the Best Rank-1 and Rank-(R1,R2,. . .,RN) Approximation of Higher-Order Tensors , 2000 .

[28] Bruno A. Olshausen,et al. Resonator Networks, 1: An Efficient Solution for Factoring High-Dimensional, Distributed Representations of Data Structures , 2020, Neural Computation.

[29] Jan M. Rabaey,et al. High-Dimensional Computing as a Nanoscalable Paradigm , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[31] Christopher J. Hillar,et al. Most Tensor Problems Are NP-Hard , 2009, JACM.

[32] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[33] BART KOSKO,et al. Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..

[34] Sompolinsky,et al. Information storage in neural networks with low levels of activity. , 1987, Physical review. A, General physics.

[35] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[36] J J Hopfield,et al. Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[37] N. Sidiropoulos,et al. On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[38] Friedrich T. Sommer,et al. Robust computation with rhythmic spike patterns , 2019, Proceedings of the National Academy of Sciences.

[39] Stephen Grossberg,et al. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[40] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[41] S. Solla,et al. Memory networks with asymmetric bonds , 1987 .

[42] Laurent Condat. Fast projection onto the simplex and the l1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pmb {l}_\mathbf {1}$$\end{ , 2015, Mathematical Programming.

[43] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .

[44] Zongben Xu,et al. Asymmetric Hopfield-type networks: Theory and applications , 1996, Neural Networks.

[45] Sompolinsky,et al. Storing infinite numbers of patterns in a spin-glass model of neural networks. , 1985, Physical review letters.

[46] J. J. Hopfield,et al. “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[47] Geoffrey E. Hinton,et al. Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[48] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[49] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[50] Tomás Gedeon,et al. Analysis of Constrained Optimization Variants of the Map-Seeking Circuit Algorithm , 2007, Journal of Mathematical Imaging and Vision.

[51] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[52] J. Kruskal. Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[53] Tomás Gedeon,et al. Convergence of Map Seeking Circuits , 2007, Journal of Mathematical Imaging and Vision.

[54] David W. Arathorn,et al. Map-Seeking Circuits in Visual Cognition: A Computational Mechanism for Biological and Machine Vision , 2002 .

[55] Philip Wolfe,et al. Validation of subgradient optimization , 1974, Math. Program..

[56] Nikos D. Sidiropoulos,et al. Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[57] Chengxiang,et al. Fixed points in a Hopfield model with random asymmetric interactions. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[58] Ross W. Gayler,et al. Multiplicative Binding, Representation Operators & Analogy , 1998 .

[59] Joos Vandewalle,et al. On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..