Linear redundancy reduction learning

Abstract Feature extraction from any combination of sensory stimuli can be seen as a detection of statistically correlated combination of inputs. A mathematical framework that describes this fact is formulated using concepts of the Information Theory. The key idea is to define a bijective transformation that conserves the volume in order to assure the transmission of all the information from inputs to outputs without spurious generation of entropy. In addition, this transformation simultaneously constrains the distribution of the outputs so that the representation is factorial, i.e., the redundancy at the output layer is minimal. We formulate this novel unsupervised learning paradigm for a linear network. The method converges in the linear case to the principal component transformation. Contrary to the “infomax” principle, we minimize the mutual information between the output neurons provided that the transformation conserves the entropy in the vertical sense (from input to outputs).

[1]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[2]  Mark D. Plumbley Efficient information transfer and anti-Hebbian neural networks , 1993, Neural Networks.

[3]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[4]  G. Hartmann,et al.  Parallel Processing in Neural Systems and Computers , 1990 .

[5]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[6]  A. Norman Redlich,et al.  Redundancy Reduction as a Strategy for Unsupervised Learning , 1993, Neural Computation.

[7]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[8]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[9]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[10]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[11]  J. Rubner,et al.  A self-organizing network for principal-component analysis , 1989 .

[12]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[13]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[14]  A. Norman Redlich,et al.  Supervised Factorial Learning , 1993, Neural Computation.

[15]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[16]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[17]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.