We present the information-theoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminators implemented with sigmoid nodes. We derive a local weight adaptation rule via gradient ascent in this objective, demonstrate its dynamics on some simple data sets, relate our approach to previous work and suggest directions in which it may be extended.
G. E. Peterson,et al.
Control Methods Used in a Study of the Vowels
Storing covariance with nonlinearly interacting neurons
Journal of mathematical biology.
E. Bienenstock,et al.
Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex
The Journal of neuroscience : the official journal of the Society for Neuroscience.
J. A. Anderson.
7 Logistic discrimination
Classification, Pattern Recognition and Reduction of Dimensionality.
Ralph Linsker,et al.
Self-organization in a perceptual network
James A. Anderson,et al.
Neurocomputing: Foundations of Research
Nathan Intrator,et al.
Feature Extraction Using an Unsupervised Neural Network
Geoffrey E. Hinton,et al.
Self-organizing neural network that discovers surfaces in random-dot stereograms
Paul W. Munro.
Visualizations of 2-D hidden unit space
[Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
J. Urgen Schmidhuber.
Learning Factorial Codes by Predictability Minimization