Online Independent Component Analysis with Local Learning Rate Adaptation

Stochastic meta-descent (SMD) is a new technique for online adaptation of local learning rates in arbitrary twice-differentiable systems. Like matrix momentum it uses full second-order information while retaining O(n) computational complexity by exploiting the efficient computation of Hessian-vector products. Here we apply SMD to independent component analysis, and employ the resulting algorithm for the blind separation of time-varying mixtures. By matching individual learning rates to the rate of change in each source signal's mixture coefficients, our technique is capable of simultaneously tracking sources that move at very different, a priori unknown speeds.

[1]  Erkki Oja,et al.  A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[2]  Nicol N. Schraudolph,et al.  A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[3]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[4]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[5]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[6]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[7]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[8]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[9]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[10]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[11]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[12]  P. Pajunen,et al.  Blind source separation and tracking using nonlinear PCA criterion: a least-squares approach , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[13]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[14]  Mark Harmon Multi-player residual advantage learning with general function , 1996 .

[15]  C. Fyfe,et al.  Generalised independent component analysis through unsupervised learning with emergent Bussgang properties , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[16]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[17]  Mance E. Harmon,et al.  Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .

[18]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[19]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[20]  Nicol N. Schraudolph Online Learning with Adaptive Local Step Sizes , 1999 .

[21]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.