论文信息 - Scrambled Objects for Least-Squares Regression

Scrambled Objects for Least-Squares Regression

We consider least-squares regression using a randomly generated subspace GP ⊂ F of finite dimension P, where F is a function space of infinite dimension, e.g. L2([0, 1]d). GP is defined as the span of P random features that are linear combinations of the basis functions of F weighted by random Gaussian i.i.d. coefficients. In particular, we consider multi-resolution random combinations at all scales of a given mother function, such as a hat function or a wavelet. In this latter case, the resulting Gaussian objects are called scrambled wavelets and we show that they enable to approximate functions in Sobolev spaces Hs([0, l]d). As a result, given N data, the least-squares estimate ĝ built from P scrambled wavelets has excess risk ‖f* - ĝ‖2P = O(‖f*‖2Hs([0,1]d)(log N)/P + P(log N)/N) for target functions f* ∈ Hs ([0,1]d) of smoothness order s > d/2. An interesting aspect of the resulting bounds is that they do not depend on the distribution P from which the data are generated, which is important in a statistical regression setting considered here. Randomization enables to adapt to any possible distribution. We conclude by describing an efficient numerical implementation using lazy expansions with numerical complexity O(2dN3/2 log N + N2), where d is the dimension of the input space.

Rémi Munos | Odalric-Ambrym Maillard

[1] S. Mallat. A wavelet tour of signal processing , 1998 .

[2] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[3] Stéphane Jaffard,et al. Décompositions en Ondelettes , 2000 .

[4] H. Bungartz,et al. Sparse grids , 2004, Acta Numerica.

[5] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .

[6] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[7] Rémi Munos,et al. Linear regression with random projections , 2012, J. Mach. Learn. Res..

[8] Rémi Munos,et al. Compressed Least-Squares Regression , 2009, NIPS.

[9] A. Barron,et al. Approximation and learning by greedy algorithms , 2008, 0803.1718.

[10] Wolfgang Hackbusch,et al. Parallel algorithms for partial differential equations - Proceedings of the sixth GAMM-seminar - Kiel, January 19-21, 1990 , 1991 .

[11] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[12] R. Tibshirani. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[13] G. Bourdaud. Ondelettes et espaces de Besov , 1995 .

[14] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15] S. Canu,et al. Functional learning through kernel , 2002 .

[16] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.

[17] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.

[18] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[19] S. Canu,et al. M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .