Optimization of Entropy with Neural Networks

OF THE DISSERTATION Optimization of Entropy with Neural Networks by Nicol Norbert Schraudolph Doctor of Philosophy in Cognitive Science and Computer Science University of California, San Diego, 1995 Professor Terrence J. Sejnowski, Chair The goal of unsupervised learning algorithms is to discover concise yet informative representations of large data sets; the minimum description length principle and exploratory projection pursuit are two representative attempts to formalize this notion. When implementedwith neural networks, both suggest the minimization of entropy at the network’s output as an objective for unsupervised learning. The empirical computation of entropy or its derivative with respect to parameters of a neural network unfortunately requires explicit knowledge of the local data density; this information is typically not available when learning from data samples. This dissertation discusses and applies three methods for making density information accessible in a neural network: parametric modelling, probabilistic networks, and nonparametric estimation. By imposing their own structure on the data, parametric density models implement impoverished but tractable forms of entropy such as the log-variance. We have used this method to improve the adaptive dynamics of an anti-Hebbian learning rule which has proven successful in extracting disparity from random stereograms.

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[3]  Andrei N. Kolmogorov,et al.  Logical basis for information theory and probability theory , 1968, IEEE Trans. Inf. Theory.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[6]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[8]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[9]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[11]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[12]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[13]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[14]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[17]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[18]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[19]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[20]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[21]  Paul W. Munro,et al.  Principal Components Analysis Of Images Via Back Propagation , 1988, Other Conferences.

[22]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[23]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[24]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[25]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[26]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[27]  Y. Le Cun,et al.  Comparing different neural network architectures for classifying handwritten digits , 1989, International 1989 Joint Conference on Neural Networks.

[28]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[29]  Peter Földiák,et al.  Adaptation and decorrelation in the cortex , 1989 .

[30]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[31]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[32]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[33]  Nathan Intrator,et al.  A Neural Network for Feature Extraction , 1989, NIPS.

[34]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[35]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[36]  Geoffrey E. Hinton,et al.  Discovering Viewpoint-Invariant Relationships That Characterize Objects , 1990, NIPS.

[37]  Nathan Intrator Exploratory Feature Extraction in Speech Signals , 1990, NIPS.

[38]  Terrence J. Sejnowski,et al.  Competitive Anti-Hebbian Learning of Invariants , 1991, NIPS.

[39]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[40]  Graeme Mitchison,et al.  Removing Time Variation with the Anti-Hebbian Differential Synapse , 1991, Neural Computation.

[41]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[42]  T. Leen Dynamics of learning in linear feature-discovery networks , 1991 .

[43]  S. Kung,et al.  Neural networks for extracting unsymmetric principal components , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[44]  Nathan Intrator,et al.  Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[45]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[46]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[47]  Paul W. Munro,et al.  Visualizations of 2-D hidden unit space , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[48]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .

[49]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[50]  Martin Fodslette Møller,et al.  Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[51]  Lokendra Shastri,et al.  Recognizing Handprinted Digit Strings: a Hybrid Connectionist/Procedural Approach , 1993 .

[52]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[53]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[54]  Terrence J. Sejnowski,et al.  Plasticity-Mediated Competitive Learning , 1994, NIPS.

[55]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[56]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[57]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[58]  Terrence J. Sejnowski,et al.  A Non-linear Information Maximisation Algorithm that Performs Blind Separation , 1994, NIPS.

[59]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[60]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[61]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.