Scrambled Objects for Least-Squares Regression

We consider least-squares regression using a randomly generated subspace GP ⊂ F of finite dimension P, where F is a function space of infinite dimension, e.g. L2([0, 1]d). GP is defined as the span of P random features that are linear combinations of the basis functions of F weighted by random Gaussian i.i.d. coefficients. In particular, we consider multi-resolution random combinations at all scales of a given mother function, such as a hat function or a wavelet. In this latter case, the resulting Gaussian objects are called scrambled wavelets and we show that they enable to approximate functions in Sobolev spaces Hs([0, l]d). As a result, given N data, the least-squares estimate ĝ built from P scrambled wavelets has excess risk ‖f* - ĝ‖2P = O(‖f*‖2Hs([0,1]d)(log N)/P + P(log N)/N) for target functions f* ∈ Hs ([0,1]d) of smoothness order s > d/2. An interesting aspect of the resulting bounds is that they do not depend on the distribution P from which the data are generated, which is important in a statistical regression setting considered here. Randomization enables to adapt to any possible distribution. We conclude by describing an efficient numerical implementation using lazy expansions with numerical complexity O(2dN3/2 log N + N2), where d is the dimension of the input space.

[1]  S. Mallat A wavelet tour of signal processing , 1998 .

[2]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[3]  Stéphane Jaffard,et al.  Décompositions en Ondelettes , 2000 .

[4]  H. Bungartz,et al.  Sparse grids , 2004, Acta Numerica.

[5]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Rémi Munos,et al.  Linear regression with random projections , 2012, J. Mach. Learn. Res..

[8]  Rémi Munos,et al.  Compressed Least-Squares Regression , 2009, NIPS.

[9]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[10]  Wolfgang Hackbusch,et al.  Parallel algorithms for partial differential equations - Proceedings of the sixth GAMM-seminar - Kiel, January 19-21, 1990 , 1991 .

[11]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[12]  R. Tibshirani Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[13]  G. Bourdaud Ondelettes et espaces de Besov , 1995 .

[14]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15]  S. Canu,et al.  Functional learning through kernel , 2002 .

[16]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[17]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[18]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[19]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .