Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information

There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[3]  Peter Földiák,et al.  Adaptation and decorrelation in the cortex , 1989 .

[4]  Richard Durbin,et al.  The computing neuron , 1989 .

[5]  Shun-ichi Amari,et al.  Backpropagation and stochastic gradient descent method , 1993, Neurocomputing.

[6]  J. Nadal,et al.  Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer Network 5 , 1994 .

[7]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[8]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[9]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.

[10]  A. J. Bell,et al.  Fast blind separation based on information theory , 1995 .

[11]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[12]  Jean-François Cardoso,et al.  Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[13]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[14]  Andrew D. Back,et al.  A First Application of Independent Component Analysis to Extracting Structure from Stock Returns , 1997, Int. J. Neural Syst..

[15]  Petteri Pajunen,et al.  Blind source separation using algorithmic information theory , 1998, Neurocomputing.

[16]  Jean-Francois Cardoso,et al.  Blind signal separation: statistical principles , 1998, Proc. IEEE.

[17]  Norimichi Tsumura,et al.  Independent Component Analysis of Skin Color Image , 1998, CIC.

[18]  Shun-ichi Amari,et al.  Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.

[19]  S. Amari,et al.  Statistical inference: learning in artificial neural networks , 1998, Trends in Cognitive Sciences.

[20]  Erkki Oja,et al.  The nonlinear PCA criterion in blind source separation: Relations with other approaches , 1998, Neurocomputing.

[21]  Shun-ichi Amari,et al.  Learned parametric mixture based ICA algorithm , 1998, Neurocomputing.

[22]  Andrzej Cichocki,et al.  Information-theoretic approach to blind separation of sources in non-linear mixture , 1998, Signal Process..

[23]  Juha Karhunen,et al.  Neural networks for blind separation with unknown number of sources , 1999, Neurocomputing.