论文信息 - The Big Data Bootstrap

The Big Data Bootstrap

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we present the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing estimator quality. BLB is well suited to modern parallel and distributed computing architectures and retains the generic applicability, statistical efficiency, and favorable theoretical properties of the bootstrap. We provide the results of an extensive empirical and theoretical investigation of BLB's behavior, including a study of its statistical correctness, its large-scale implementation and performance, selection of hyperparameters, and performance on real data.

[1] Robert Tibshirani,et al. How Many Bootstraps , 1985 .

[2] P. Bickel,et al. Richardson Extrapolation and the Bootstrap , 1988 .

[3] B. Efron. More Efficient Bootstrap Computations , 1990 .

[4] P. Bickel,et al. Extrapolation and the bootstrap , 2002 .

[5] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[6] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .

[7] P. Bickel,et al. ON THE CHOICE OF m IN THE m OUT OF n BOOTSTRAP AND CONFIDENCE BOUNDS FOR EXTREMA , 2008 .

[8] Purnamrita Sarkar,et al. A scalable bootstrap for massive data , 2011, 1112.5016.

[9] F. Götze,et al. RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .