Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain

We present the information-theoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminators implemented with sigmoid nodes. We derive a local weight adaptation rule via gradient ascent in this objective, demonstrate its dynamics on some simple data sets, relate our approach to previous work and suggest directions in which it may be extended.

[1]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[2]  T. Sejnowski,et al.  Storing covariance with nonlinearly interacting neurons , 1977, Journal of mathematical biology.

[3]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  J. A. Anderson,et al.  7 Logistic discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[5]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[6]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[7]  Nathan Intrator,et al.  Feature Extraction Using an Unsupervised Neural Network , 1992, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[9]  Paul W. Munro,et al.  Visualizations of 2-D hidden unit space , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[10]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .