Unsupervised neural network learning procedures for feature extraction and classification

In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: information-preserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to real-world problems.

[1]  Sun-Yuan Kung,et al.  A neural network learning algorithm for adaptive principal component extraction (APEX) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hazem M. Abbas,et al.  A neural model for adaptive Karhunen Loeve transformation (KLT) , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[5]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[6]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[7]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[8]  Michael I. Jordan,et al.  Hierarchies of Adaptive Experts , 1991, NIPS.

[9]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[10]  Mark D. Plumbley Efficient information transfer and anti-Hebbian neural networks , 1993, Neural Networks.

[11]  Geoffrey E. Hinton,et al.  Learning Mixture Models of Spatial Coherence , 1993, Neural Computation.

[12]  Joseph J. Atick,et al.  Towards a Theory of Early Visual Processing , 1990, Neural Computation.

[13]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[15]  Ralph Linsker,et al.  Deriving Receptive Fields Using an Optimal Encoding Criterion , 1992, NIPS.

[16]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[17]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[18]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[19]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[20]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[21]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[22]  Steven J. Nowlan,et al.  Maximum Likelihood Competitive Learning , 1989, NIPS.

[23]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[24]  Nathan Intrator Feature Extraction using an Unsupervised Neural Network , 1991 .

[25]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[26]  Conrad Galland,et al.  Learning in Deterministic Boltzmann Machine Networks , 1992 .

[27]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[28]  Suzanna Becker,et al.  Learning to Categorize Objects Using Temporal Coherence , 1992, NIPS.

[29]  Gene H. Golub,et al.  Matrix computations , 1983 .

[30]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[31]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[32]  福島 邦彦,et al.  A Hierarchical Neural Network Model for Associative Memory , 1985 .

[33]  Anthony J. Bell,et al.  Self-organization in Real Neurons: Anti-Hebb in 'Channel Space'? , 1991, NIPS.

[34]  F. Fallside Analysis of linear predictive data such as speech by a class of single-layer connectionist models , 1989 .

[35]  J. Rubner,et al.  A Self-Organizing Network for Principal-Component Analysis , 1989 .

[36]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[37]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[38]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[39]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[40]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[41]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[42]  Juha Karhunen,et al.  Tracking of sinusoidal frequencies by neural network learning algorithms , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[44]  T. Leen Dynamics of learning in linear feature-discovery networks , 1991 .

[45]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[46]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[47]  M. Mozer Discovering Discrete Distributed Representations with Iterative Competitive Learning , 1990, NIPS 1990.

[48]  Mark D. Plumbley A Hebbian/anti-Hebbian network which optimizes information capacity by orthonormalizing the principal subspace , 1993 .

[49]  Simon Haykin,et al.  Application of unsupervised neural networks to the enhancement of polarization targets in dual-polarized radar images , 1991, [1991] Conference Record of the Twenty-Fifth Asilomar Conference on Signals, Systems & Computers.

[50]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[51]  S. P. Luttrell Hierarchical vector quantisation , 1989 .

[52]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[53]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[54]  Carsten Peterson,et al.  Explorations of the mean field theory learning algorithm , 1989, Neural Networks.

[55]  Geoffrey E. Hinton,et al.  Discovering Viewpoint-Invariant Relationships That Characterize Objects , 1990, NIPS.

[56]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[57]  Joseph J. Atick,et al.  Predicting Ganglion and Simple Cell Receptive Field Organizations , 1991, Int. J. Neural Syst..

[58]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[59]  Helen Suzanna Becker,et al.  An information-theoretic unsupervised learning algorithm for neural networks , 1993 .

[60]  Kurt Hornik,et al.  Convergence analysis of local feature extraction algorithms , 1992, Neural Networks.

[61]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[63]  C. Malsburg Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.

[64]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[65]  E. Oja,et al.  Fast adaptive formation of orthogonalizing filters and associative memory in recurrent networks of neuron-like elements , 1976, Biological Cybernetics.

[66]  Jan J. Gerbrands,et al.  On the relationships between SVD, KLT and PCA , 1981, Pattern Recognit..

[67]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[68]  Pierre Comon,et al.  Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[69]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[70]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[71]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[72]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[73]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[74]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[75]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[76]  Terrence J. Sejnowski,et al.  Competitive Anti-Hebbian Learning of Invariants , 1991, NIPS.

[77]  Todd K. Leen,et al.  Hebbian feature discovery improves classifier efficiency , 1990, 1990 IJCNN International Joint Conference on Neural Networks.