On the Size of the Online Kernel Sparsification Dictionary

We analyze the size of the dictionary constructed from online kernel sparsi_cation, using a novel formula that expresses the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Using this formula, we are able to connect the cardinality of the dictionary with the eigen-decay of the covariance operator. In particular, we show that under certain technical conditions, the size of the dictionary will always grow sublinearly in the number of data points, and, as a consequence, the kernel linear regressor constructed from the resulting dictionary is consistent.

[1]  W. Hoeffding The strong law of large numbers for u-statistics. , 1961 .

[2]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[3]  T. Cover,et al.  Determinant inequalities via information theory , 1988 .

[4]  L. Elsner,et al.  The Hoffman-Wielandt inequality in infinite dimensions , 1994 .

[5]  T. Philips,et al.  The Moment Bound is Tighter than Chernoff's Bound for Positive Tail Probabilities , 1995 .

[6]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[7]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[8]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[9]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[10]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[11]  OF Epartment,et al.  A NEW LOOK AT NEWTON’S INEQUALITIES , 2000 .

[12]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[13]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[15]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[16]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[17]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[18]  Mikio L. Braun,et al.  Spectral properties of the kernel matrix and their relation to kernel methods in machine learning , 2005 .

[19]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[20]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[21]  Xin Xu,et al.  A Sparse Kernel-Based Least-Squares Temporal Difference Algorithm for Reinforcement Learning , 2006, ICNC.

[22]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[23]  Sergios Theodoridis,et al.  Online Kernel-Based Classification Using Adaptive Projection Algorithms , 2008, IEEE Transactions on Signal Processing.

[24]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[25]  Jan Peters,et al.  Incremental Sparsification for Real-time Online Model Learning , 2010, AISTATS.

[26]  Rong Jin,et al.  Improved Bound for the Nystrom's Method and its Application to Kernel Classification , 2011, ArXiv.

[27]  Rong Jin,et al.  Improved Bounds for the Nyström Method With Application to Kernel Classification , 2011, IEEE Transactions on Information Theory.