论文信息 - Stochastic Approximation for Online Tensorial Independent Component Analysis

Stochastic Approximation for Online Tensorial Independent Component Analysis

Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing. In this paper, we present a convergence analysis for an online tensorial ICA algorithm, by viewing the problem as a nonconvex stochastic approximation problem. For estimating one component, we provide a dynamics-based analysis to prove that our online tensorial ICA algorithm with a specific choice of stepsize achieves a sharp finite-sample error bound. In particular, under a mild assumption on the data-generating distribution and a scaling condition such that d/T is sufficiently small up to a polylogarithmic factor of data dimension d and sample size T , a sharp finite-sample error bound of Õ( √ d/T ) can be obtained.

Michael I. Jordan | Chris Junchi Li | C. J. Li

[1] Yue M. Lu,et al. The scaling limit of high-dimensional online independent component analysis , 2017, NIPS.

[2] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[3] John Wright,et al. Geometry and Symmetry in Short-and-Sparse Deconvolution , 2019, ICML.

[4] Michael I. Jordan,et al. On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points , 2019 .

[5] Yuxin Chen,et al. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[6] Terrence J. Sejnowski,et al. Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[7] Tengyu Ma,et al. On the optimization landscape of tensor decompositions , 2017, Mathematical Programming.

[8] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[9] Stanislav Minsker. Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[10] Visa Koivunen,et al. Identifiability, separability, and uniqueness of linear ICA models , 2004, IEEE Signal Processing Letters.

[11] Yu Bai,et al. Subgradient Descent Learns Orthogonal Dictionaries , 2018, ICLR.

[12] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[13] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[14] Daniel P. Robinson,et al. Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms , 2018, NeurIPS.

[15] Yuxin Chen,et al. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[16] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[17] Yi Ma,et al. Complete Dictionary Learning via 𝓁4-Norm Maximization over the Orthogonal Group , 2019, J. Mach. Learn. Res..

[18] Yanjun Li,et al. Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere , 2018, NeurIPS.

[19] Michael I. Jordan,et al. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[20] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[21] Prateek Jain,et al. Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[22] P. Bickel,et al. Efficient independent component analysis , 2006, 0705.4230.

[23] James V. Stone. Independent Component Analysis: A Tutorial Introduction , 2007 .

[24] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[25] K. Nordhausen,et al. Fourth Moments and Independent Component Analysis , 2014, 1406.4765.

[26] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[27] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.

[28] Sen Na,et al. High-dimensional Varying Index Coefficient Models via Stein's Identity , 2018, J. Mach. Learn. Res..

[29] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[30] Naâmane Laïb. Exponential-type inequalities for martingale difference sequences. Application to nonparametric regression estimation , 1999 .

[31] Yair Carmon,et al. Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations , 2020, COLT.

[32] Pauliina Ilmonen,et al. Semiparametrically efficient inference based on signed ranks in symmetric independent component models , 2011, 1202.5159.

[33] Michael I. Jordan,et al. Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.

[34] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[35] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[36] Aapo Hyvärinen,et al. Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[37] A. Tsybakov,et al. Nonparametric independent component analysis , 2004 .

[38] Han Liu,et al. Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes , 2018, NIPS.

[39] Lin F. Yang,et al. Misspecified nonconvex statistical optimization for sparse phase retrieval , 2019, Mathematical Programming.

[40] H. Oja,et al. Scatter Matrices and Independent Component Analysis , 2006 .

[41] M. Yuan,et al. Independent component analysis via nonparametric maximum likelihood estimation , 2012, 1206.0457.

[42] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..

[43] John C. Duchi,et al. First-Order Methods for Nonconvex Quadratic Minimization , 2020, SIAM Rev..

[44] D. Farnsworth. A First Course in Order Statistics , 1993 .

[45] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46] Suvrit Sra,et al. Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[47] Dmitriy Drusvyatskiy,et al. Subgradient Methods for Sharp Weakly Convex Functions , 2018, Journal of Optimization Theory and Applications.

[48] Jakub W. Pachocki,et al. Geometric median in nearly linear time , 2016, STOC.

[49] M. Hallin,et al. R-Estimation for Asymmetric Independent Component Analysis , 2013, 1304.3073.

[50] Yuejie Chi,et al. Spectral Methods for Data Science: A Statistical Perspective , 2021, Found. Trends Mach. Learn..

[51] D. Chakrabarti,et al. A fast fixed - point algorithm for independent component analysis , 1997 .

[52] Robert Tibshirani,et al. Independent Components Analysis through Product Density Estimation , 2002, NIPS.

[53] R. Pemantle,et al. Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[54] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[55] Pengcheng Zhou,et al. Short-and-Sparse Deconvolution - A Geometric Approach , 2019, ICLR.

[56] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[57] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.

[58] Vincent Q. Vu,et al. MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[59] Erkki Oja,et al. Performance analysis of the FastICA algorithm and Crame/spl acute/r-rao bounds for linear independent component analysis , 2006, IEEE Transactions on Signal Processing.

[60] Jie Liu,et al. Stochastic Recursive Gradient Algorithm for Nonconvex Optimization , 2017, ArXiv.

[61] Tong Zhang,et al. Near-optimal stochastic approximation for online principal component estimation , 2016, Math. Program..

[62] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[63] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[64] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[65] Ashok Cutkosky,et al. Momentum Improves Normalized SGD , 2020, ICML.

[66] S. Bonhomme,et al. Consistent noisy independent component analysis , 2008 .

[67] Xiao Li,et al. Nonconvex Robust Low-rank Matrix Recovery , 2018, SIAM J. Optim..

[68] T. Kollo. Multivariate skewness and kurtosis measures with an application in ICA , 2008 .

[69] Zhihui Zhu,et al. A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution , 2019, NeurIPS.

[70] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[71] John Wright,et al. Efficient Dictionary Learning with Gradient Descent , 2018, ICML.

[72] Michael I. Jordan,et al. First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[73] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[74] Anima Anandkumar,et al. Spectral Learning on Matrices and Tensors , 2019, Found. Trends Mach. Learn..

[75] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.

[76] Suvrit Sra,et al. First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[77] Yan Shuo Tan,et al. Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval , 2019, ArXiv.

[78] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[79] Shang Wu,et al. Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms , 2017, J. Mach. Learn. Res..

[80] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[81] Yingbin Liang,et al. SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[82] Pierre Comon. Independent component analysis - a new concept? signal processing , 1994 .

[83] Yuanzhi Li,et al. First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[84] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[85] Quansheng Liu,et al. Large deviation exponential inequalities for supermartingales , 2011 .

[86] John Wright,et al. When Are Nonconvex Problems Not Scary? , 2015, ArXiv.