Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gaussian noise, in which case the goal is to provide an unbiased estimator with smallest mean squared error (A-optimal design). A related approach, more common in computer science, is to assume the responses are arbitrary but fixed, in which case the goal is to estimate the least squares solution using few responses, as quickly as possible, for worst-case inputs. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a framework for experimental design where the responses are produced by an arbitrary unknown distribution. We show that there is an efficient randomized experimental design procedure that achieves strong variance bounds for an unbiased estimator using few responses in this general model. Nearly tight bounds for the classical A-optimality criterion, as well as improved bounds for worst-case responses, emerge as special cases of this result. In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i.i.d. importance sampling method: inverse score sampling. A key novelty of our analysis is in developing new expected error bounds for worst-case regression by controlling the tail behavior of i.i.d. sampling via the jointness of volume sampling. Our result motivates a new minimax-optimality criterion for experimental design which can be viewed as an extension of both A-optimal design and sampling for worst-case regression.

[1]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[2]  Errol C. Caby An Introduction to Statistical Signal Processing , 2006, Technometrics.

[3]  F. Pukelsheim Optimal Design of Experiments (Classics in Applied Mathematics) (Classics in Applied Mathematics, 50) , 2006 .

[4]  Y. Peres,et al.  Determinantal Processes and Independence , 2005, math/0503110.

[5]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[6]  Julie Zhou,et al.  Minimax robust designs for field experiments , 2009 .

[7]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[8]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[9]  Christos Boutsidis,et al.  Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[10]  Christos Boutsidis,et al.  Near-Optimal Coresets for Least-Squares Regression , 2012, IEEE Transactions on Information Theory.

[11]  D. Wiens,et al.  V-optimal designs for heteroscedastic regression , 2014 .

[12]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[13]  Ping Ma,et al.  A statistical perspective on algorithmic leveraging , 2013, J. Mach. Learn. Res..

[14]  Michael W. Mahoney,et al.  Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares , 2015, ICML.

[15]  Michael W. Mahoney,et al.  RandNLA , 2016, Commun. ACM.

[16]  Suvrit Sra,et al.  Efficient Sampling for k-Determinantal Point Processes , 2015, AISTATS.

[17]  Manfred K. Warmuth,et al.  Unbiased estimates for linear regression via volume sampling , 2017, NIPS.

[18]  Aarti Singh,et al.  On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models , 2016, J. Mach. Learn. Res..

[19]  Yuanzhi Li,et al.  Near-Optimal Design of Experiments via Regret Minimization , 2017, ICML.

[20]  Petros Drineas,et al.  Lectures on Randomized Numerical Linear Algebra , 2017, IAS/Park City Mathematics Series.

[21]  Manfred K. Warmuth,et al.  Reverse iterative volume sampling for linear regression , 2018, J. Mach. Learn. Res..

[22]  Manfred K. Warmuth,et al.  Subsampling for Ridge Regression via Regularized Volume Sampling , 2017, AISTATS.

[23]  Manfred K. Warmuth,et al.  Leveraged volume sampling for linear regression , 2018, NeurIPS.

[24]  Michal Derezinski,et al.  Fast determinantal point processes via distortion-free intermediate sampling , 2018, COLT.

[25]  Eric Price,et al.  Active Regression via Linear-Sample Sparsification , 2017, COLT.

[26]  Manfred K. Warmuth,et al.  Correcting the bias in least squares regression with volume-rescaled sampling , 2018, AISTATS.

[27]  Mohit Singh,et al.  Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design , 2018, SODA.