Correcting the bias in least squares regression with volume-rescaled sampling

Consider linear regression where the examples are generated by an unknown distribution on $R^d\times R$. Without any assumptions on the noise, the linear least squares solution for any i.i.d. sample will typically be biased w.r.t. the least squares optimum over the entire distribution. However, we show that if an i.i.d. sample of any size k is augmented by a certain small additional sample, then the solution of the combined sample becomes unbiased. We show this when the additional sample consists of d points drawn jointly according to the input distribution that is rescaled by the squared volume spanned by the points. Furthermore, we propose algorithms to sample from this volume-rescaled distribution when the data distribution is only known through an i.i.d sample.

[1]  L. Kantorovich,et al.  Functional analysis and applied mathematics , 1963 .

[2]  H. R. Vaart A Note on Wilks' Internal Scatter , 1965 .

[3]  S. Mitra A density-free approach to the matrix variate beta distribution , 1970 .

[4]  M. Teboulle,et al.  A geometric property of the least squares solution of linear equations , 1990 .

[5]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[6]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[7]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[8]  Luis Rademacher,et al.  Efficient Volume Sampling for Row/Column Subset Selection , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[9]  Ben Taskar,et al.  k-DPPs: Fixed-Size Determinantal Point Processes , 2011, ICML.

[10]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[11]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[12]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[13]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[14]  Christos Boutsidis,et al.  Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[15]  Ulrich Paquet,et al.  Bayesian Low-Rank Determinantal Point Processes , 2016, RecSys.

[16]  Nisheeth K. Vishnoi,et al.  How to be Fair and Diverse? , 2016, ArXiv.

[17]  Suvrit Sra,et al.  Efficient Sampling for k-Determinantal Point Processes , 2015, AISTATS.

[18]  Rémi Bardenet,et al.  On a few statistical applications of determinantal point processes , 2017 .

[19]  Xue Chen,et al.  Condition number-free query and active learning of linear families , 2017, ArXiv.

[20]  Suvrit Sra,et al.  Polynomial time algorithms for dual volume sampling , 2017, NIPS.

[21]  Suvrit Sra,et al.  Elementary Symmetric Polynomials for Optimal Experimental Design , 2017, NIPS.

[22]  Yuanzhi Li,et al.  Near-Optimal Design of Experiments via Regret Minimization , 2017, ICML.

[23]  Manfred K. Warmuth,et al.  Reverse iterative volume sampling for linear regression , 2018, J. Mach. Learn. Res..

[24]  Manfred K. Warmuth,et al.  Leveraged volume sampling for linear regression , 2018, NeurIPS.

[25]  Nisheeth K. Vishnoi,et al.  Fair and Diverse DPP-based Data Summarization , 2018, ICML.

[26]  Mohit Singh,et al.  Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design , 2018, SODA.