Linear regression with random projections

We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of finite dimension P) of a given large (possibly infinite) dimensional function space F, for example, L2([0,1]d;R). GP is defined as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefficients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F, and derive excess risk bounds for a specific regression algorithm (least squares regression in GP). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments.

[1]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[2]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[3]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[4]  Wolfgang Hackbusch,et al.  Parallel algorithms for partial differential equations - Proceedings of the sixth GAMM-seminar - Kiel, January 19-21, 1990 , 1991 .

[5]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[6]  G. Bourdaud Ondelettes et espaces de Besov , 1995 .

[7]  M. Lifshits Gaussian Random Functions , 1995 .

[8]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  S. Janson Gaussian Hilbert Spaces , 1997 .

[11]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[12]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[13]  Sébastien Deguy,et al.  A Flexible Noise Model For Designing Maps , 2001, VMV.

[14]  J. Aubry,et al.  Random Wavelet Series , 2002 .

[15]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[16]  S. Canu,et al.  Functional learning through kernel , 2002 .

[17]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[18]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .

[19]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[20]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[21]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[22]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[23]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[24]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[25]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[26]  Pierre Alquier PAC-Bayesian bounds for randomized empirical risk minimizers , 2007, 0712.1698.

[27]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28]  Klaus Jansen,et al.  Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques , 2006, Lecture Notes in Computer Science.

[29]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[30]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[31]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[32]  Arnaud Durand Random Wavelet Series Based on a Tree-Indexed Markov Chain , 2007, 0709.3597.

[33]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[34]  Bin Zhao,et al.  Compressed Spectral Clustering , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[35]  Michael Frazier,et al.  Decomposition of Besov Spaces , 2009 .

[36]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[37]  Rémi Munos,et al.  Scrambled Objects for Least-Squares Regression , 2010, NIPS.

[38]  Jean-Yves Audibert,et al.  Robust linear regression through PAC-Bayesian truncation , 2010 .

[39]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[40]  Amit Singer,et al.  Dense Fast Random Projections and Lean Walsh Transforms , 2011, Discret. Comput. Geom..