Source Separation as a By-Product of Regularization

This paper reveals a previously ignored connection between two important fields: regularization and independent component analysis (ICA). We show that at least one representative of a broad class of algorithms (regularizers that reduce network complexity) extracts independent features as a by-product. This algorithm is Flat Minimum Search (FMS), a recent general method for finding low-complexity networks with high generalization capability. FMS works by minimizing both training error and required weight precision. According to our theoretical analysis the hidden layer of an FMS-trained autoassociator attempts at coding each input by a sparse code with as few simple features as possible. In experiments the method extracts optimal codes for difficult versions of the "noisy bars" benchmark problem by separating the underlying sources, whereas ICA and PCA fail. Real world images are coded with fewer bits per pixel than by ICA or PCA.

[1]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .

[3]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[4]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[5]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[6]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[7]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[9]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[10]  M. Mozer Discovering Discrete Distributed Representations with Iterative Competitive Learning , 1990, NIPS 1990.

[11]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[12]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[13]  Bruno A. Olshausen,et al.  Inferring Sparse, Overcomplete Image Codes Using an Efficient Coding Framework , 1998, NIPS.

[14]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  Developing Population Codes by Minimizing Description Length , 1993, NIPS.

[16]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[17]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[18]  R. Zemel,et al.  Learning sparse multiple cause models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[19]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[20]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[21]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[22]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[23]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[24]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..