论文信息 - Bootstrapping Big Data

Bootstrapping Big Data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively computationally demanding. As an alternative, we introduce the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a more computationally efficient, though still robust, means of quantifying the quality of estimators. BLB shares the generic applicability and statistical efficiency of the bootstrap and is furthermore well suited for application to very large datasets using modern distributed computing architectures, as it uses only small subsets of the observed data at any point during its execution. We provide both empirical and theoretical results which demonstrate the efficacy of BLB.

Purnamrita Sarkar | Ameet Talwalkar | Michael I. Jordan | Ariel Kleiner

[1] F. Götze,et al. RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[2] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[3] Arnold J Stromberg,et al. Subsampling , 2001, Technometrics.

[4] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .