Foundations of recurrent neural networks

"Artificial neural networks" provide an appealing model of computation. Such networks consist of an interconnection of a number of parallel agents, or "neurons." Each of these receives certain signals as inputs, computes some simple function, and produces a signal as output, which is in turn broadcast to the successive neurons involved in a given computation. Some of the signals originate from outside the network, and act as inputs to the whole system, while some of the output signals are communicated back to the environment and are used to encode the end result of computation. In this dissertation we focus on the "recurrent network" model, in which the underlying graph is not subject to any constraints. We investigate the computational power of neural nets, taking a classical computer science point of view. We characterize the language recognition power of networks in terms of the types of numbers (constants) utilized as weights. From a mathematical viewpoint, it is natural to consider integer, rational, and real numbers. From the standpoint of computer science, natural classes of formal languages are regular, recursive, and "all languages." We establish a precise correspondence between the mathematical and computing choices. Furthermore, when the computation time of the network is constrained to be polynomial in the input size, the classes recognized by the respective networks are: regular, P, and Analog-P, i.e. P/poly. Among other results described in this thesis are a proper hierarchy of networks using Kolmogorov-complexity characterizations, the imposition of space constraints, and a proposed "Church's thesis of analog computing."

[1]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[2]  David E. Muller,et al.  Complexity in Electronic Switching Circuits , 1956, IRE Trans. Electron. Comput..

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[5]  Ann Yasuhara,et al.  Recursive function theory and logic , 1971, Computer science and applied mathematics.

[6]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[7]  Larry J. Stockmeyer,et al.  A characterization of the power of vector machines , 1974, STOC '74.

[8]  Eduardo D. Sontag,et al.  On certain questions of rationality and decidability , 1975 .

[9]  Arto Salomaa,et al.  Automata-Theoretic Aspects of Formal Power Series , 1978, Texts and Monographs in Computer Science.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  Eduardo D. Sontag,et al.  Realization Theory of Discrete-Time Nonlinear Systems: Part I - The Bounded Case , 1979 .

[12]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[13]  George William Cherry Pascal programming structures: An introduction to systematic programming , 1980 .

[14]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[15]  A. Yao Separating the polynomial-time hierarchy by oracles , 1985 .

[16]  B. Dickinson,et al.  The complexity of analog computation , 1986 .

[17]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[18]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[19]  J. R. Brown,et al.  Artificial neural network on a SIMD architecture , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[20]  Jean Berstel,et al.  Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[21]  Eduardo Sontag Controllability is harder to decide than accessibility , 1988 .

[22]  Eduardo D. Sontag,et al.  Backpropagation separates when perceptrons do , 1989, International 1989 Joint Conference on Neural Networks.

[23]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[24]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[25]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[26]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[27]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[28]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[29]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[30]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[31]  H. Stowell The emperor's new mind R. Penrose, Oxford University Press, New York (1989) 466 pp. $24.95 , 1990, Neuroscience.

[32]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[33]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[34]  Halbert White,et al.  Approximating and learning unknown mappings using multilayer feedforward networks with bounded weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[35]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[36]  Eduardo D. Sontag,et al.  Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[37]  P. Boas Machine models and simulations , 1991 .

[38]  Hava T. Siegelmann,et al.  The allocation of documents in multiprocessor information retrieval systems: an application of genetic algorithms , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[39]  Hava T. Siegelmann,et al.  Integrating Implicit Answers with Object-Oriented Queries , 1991, VLDB.

[40]  Timothy X. Brown,et al.  Competitive neural architecture for hardware solution to the assignment problem , 1991, Neural Networks.

[41]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[42]  Roy Batruni,et al.  A multilayer neural network with piecewise-linear structure and back-propagation learning , 1991, IEEE Trans. Neural Networks.

[43]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[44]  Thomas A. Henzinger,et al.  Temporal proof methodologies for real-time systems , 1991, POPL '91.

[45]  Y. C. Lee,et al.  Turing equivalence of neural networks with second order connection weights , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[46]  Georg Schnitger,et al.  On the computational power of sigmoid versus Boolean threshold circuits , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[47]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[48]  B. MacLennan Continuous Symbol Systems: The Logic of Connectionism , 1991 .

[49]  Michael B. Matthews,et al.  On the uniform approximation of nonlinear discrete-time fading-memory systems using neural network models , 1992 .

[50]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[51]  José L. Balcázar,et al.  Characterizations of Logarithmic Advice Complexity Classes , 1992, IFIP Congress.

[52]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[53]  Georg Schnitger,et al.  The Power of Approximation: A Comparison of Activation Functions , 1992, NIPS.

[54]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[55]  Eduardo D. Sontag,et al.  NEURAL NETS AS SYSTEMS MODELS AND CONTROLLERS , 1992 .

[56]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[57]  Solving combinatorial optimization problems by gradient flows , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[58]  Eduardo D. Sontag,et al.  Rate of approximation results motivated by robust neural network learning , 1993, COLT '93.

[59]  L. Motus Time concepts in real-time software , 1993 .

[60]  Eduardo D. Sontag,et al.  UNIQUENESS OF WEIGHTS FOR NEURAL NETWORKS , 1993 .

[61]  Aaron D. Wyner,et al.  A Universal Turing Machine with Two Internal States , 1993 .

[62]  Hava T. Siegelmann,et al.  On the power of sigmoid neural networks , 1993, COLT '93.

[63]  Hava T. Siegelmann,et al.  Document Allocation In Multiprocessor Information Retrieval Systems , 1993, Advanced Database Systems.

[64]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[65]  Wolfgang Maass,et al.  Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[66]  Eduardo D. Sontag,et al.  Finiteness results for sigmoidal “neural” networks , 1993, STOC.

[67]  Hava T. Siegelmann,et al.  Some structural complexity aspects of neural computation , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[68]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[69]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..