Stochastic Approximation for Online Tensorial Independent Component Analysis

Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing. In this paper, we present a convergence analysis for an online tensorial ICA algorithm, by viewing the problem as a nonconvex stochastic approximation problem. For estimating one component, we provide a dynamics-based analysis to prove that our online tensorial ICA algorithm with a specific choice of stepsize achieves a sharp finite-sample error bound. In particular, under a mild assumption on the data-generating distribution and a scaling condition such that d/T is sufficiently small up to a polylogarithmic factor of data dimension d and sample size T , a sharp finite-sample error bound of Õ( √ d/T ) can be obtained.

[1]  Yue M. Lu,et al.  The scaling limit of high-dimensional online independent component analysis , 2017, NIPS.

[2]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[3]  John Wright,et al.  Geometry and Symmetry in Short-and-Sparse Deconvolution , 2019, ICML.

[4]  Michael I. Jordan,et al.  On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points , 2019 .

[5]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[6]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[7]  Tengyu Ma,et al.  On the optimization landscape of tensor decompositions , 2017, Mathematical Programming.

[8]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[9]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[10]  Visa Koivunen,et al.  Identifiability, separability, and uniqueness of linear ICA models , 2004, IEEE Signal Processing Letters.

[11]  Yu Bai,et al.  Subgradient Descent Learns Orthogonal Dictionaries , 2018, ICLR.

[12]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[13]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[14]  Daniel P. Robinson,et al.  Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms , 2018, NeurIPS.

[15]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[16]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[17]  Yi Ma,et al.  Complete Dictionary Learning via 𝓁4-Norm Maximization over the Orthogonal Group , 2019, J. Mach. Learn. Res..

[18]  Yanjun Li,et al.  Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere , 2018, NeurIPS.

[19]  Michael I. Jordan,et al.  First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[20]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[21]  Prateek Jain,et al.  Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[22]  P. Bickel,et al.  Efficient independent component analysis , 2006, 0705.4230.

[23]  James V. Stone Independent Component Analysis: A Tutorial Introduction , 2007 .

[24]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[25]  K. Nordhausen,et al.  Fourth Moments and Independent Component Analysis , 2014, 1406.4765.

[26]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[27]  Thomas Hofmann,et al.  Escaping Saddles with Stochastic Gradients , 2018, ICML.

[28]  Sen Na,et al.  High-dimensional Varying Index Coefficient Models via Stein's Identity , 2018, J. Mach. Learn. Res..

[29]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[30]  Naâmane Laïb Exponential-type inequalities for martingale difference sequences. Application to nonparametric regression estimation , 1999 .

[31]  Yair Carmon,et al.  Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations , 2020, COLT.

[32]  Pauliina Ilmonen,et al.  Semiparametrically efficient inference based on signed ranks in symmetric independent component models , 2011, 1202.5159.

[33]  Michael I. Jordan,et al.  Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[34]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[35]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[36]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[37]  A. Tsybakov,et al.  Nonparametric independent component analysis , 2004 .

[38]  Han Liu,et al.  Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes , 2018, NIPS.

[39]  Lin F. Yang,et al.  Misspecified nonconvex statistical optimization for sparse phase retrieval , 2019, Mathematical Programming.

[40]  H. Oja,et al.  Scatter Matrices and Independent Component Analysis , 2006 .

[41]  M. Yuan,et al.  Independent component analysis via nonparametric maximum likelihood estimation , 2012, 1206.0457.

[42]  Quanquan Gu,et al.  Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[43]  John C. Duchi,et al.  First-Order Methods for Nonconvex Quadratic Minimization , 2020, SIAM Rev..

[44]  D. Farnsworth A First Course in Order Statistics , 1993 .

[45]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[47]  Dmitriy Drusvyatskiy,et al.  Subgradient Methods for Sharp Weakly Convex Functions , 2018, Journal of Optimization Theory and Applications.

[48]  Jakub W. Pachocki,et al.  Geometric median in nearly linear time , 2016, STOC.

[49]  M. Hallin,et al.  R-Estimation for Asymmetric Independent Component Analysis , 2013, 1304.3073.

[50]  Yuejie Chi,et al.  Spectral Methods for Data Science: A Statistical Perspective , 2021, Found. Trends Mach. Learn..

[51]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[52]  Robert Tibshirani,et al.  Independent Components Analysis through Product Density Estimation , 2002, NIPS.

[53]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[54]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[55]  Pengcheng Zhou,et al.  Short-and-Sparse Deconvolution - A Geometric Approach , 2019, ICLR.

[56]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[57]  Zhouchen Lin,et al.  Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.

[58]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[59]  Erkki Oja,et al.  Performance analysis of the FastICA algorithm and Crame/spl acute/r-rao bounds for linear independent component analysis , 2006, IEEE Transactions on Signal Processing.

[60]  Jie Liu,et al.  Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[61]  Tong Zhang,et al.  Near-optimal stochastic approximation for online principal component estimation , 2016, Math. Program..

[62]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[63]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[64]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[65]  Ashok Cutkosky,et al.  Momentum Improves Normalized SGD , 2020, ICML.

[66]  S. Bonhomme,et al.  Consistent noisy independent component analysis , 2008 .

[67]  Xiao Li,et al.  Nonconvex Robust Low-rank Matrix Recovery , 2018, SIAM J. Optim..

[68]  T. Kollo Multivariate skewness and kurtosis measures with an application in ICA , 2008 .

[69]  Zhihui Zhu,et al.  A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution , 2019, NeurIPS.

[70]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[71]  John Wright,et al.  Efficient Dictionary Learning with Gradient Descent , 2018, ICML.

[72]  Michael I. Jordan,et al.  First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[73]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[74]  Anima Anandkumar,et al.  Spectral Learning on Matrices and Tensors , 2019, Found. Trends Mach. Learn..

[75]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[76]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[77]  Yan Shuo Tan,et al.  Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval , 2019, ArXiv.

[78]  Alan M. Frieze,et al.  Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[79]  Shang Wu,et al.  Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms , 2017, J. Mach. Learn. Res..

[80]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[81]  Yingbin Liang,et al.  SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[82]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[83]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[84]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[85]  Quansheng Liu,et al.  Large deviation exponential inequalities for supermartingales , 2011 .

[86]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.