Communication-efficient distributed statistical learning

We present the Communication-efficient Surrogate Likelihood (CSL) framework for solving distributed statistical learning problems. CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation and Bayesian inference. For low-dimensional estimation, CSL provably improves upon the averaging schemes and facilitates the construction of confidence intervals. For high-dimensional regularized estimation, CSL leads to a minimax optimal estimator with minimal communication cost. For Bayesian inference, CSL can be used to form a communication-efficient quasi-posterior distribution that converges to the true posterior. This quasi-posterior procedure significantly improves the computational efficiency of MCMC algorithms even in a non-distributed setting. The methods are illustrated through empirical studies.

[1]  V. Chernozhukov,et al.  An MCMC Approach to Classical Estimation , 2002, 2301.07782.

[2]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[4]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[5]  Mladen Kolar,et al.  Efficient Distributed Learning with Sparsity , 2016, ICML.

[6]  Martin J. Wainwright,et al.  Optimality guarantees for distributed statistical estimation , 2014, 1405.0782.

[7]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[8]  David P. Woodruff,et al.  Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[9]  Gersende Fort,et al.  A Shrinkage-Thresholding Metropolis Adjusted Langevin Algorithm for Bayesian Variable Selection , 2013, IEEE Journal of Selected Topics in Signal Processing.

[10]  Santosh S. Vempala,et al.  Principal Component Analysis and Higher Correlations for Distributed Data , 2013, COLT.

[11]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[12]  Marcelo Pereyra,et al.  Proximal Markov chain Monte Carlo algorithms , 2013, Statistics and Computing.

[13]  Tengyu Ma,et al.  On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.

[14]  Qiang Liu,et al.  Communication-efficient sparse regression: a one-shot approach , 2015, ArXiv.

[15]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[16]  Yuchen Zhang,et al.  Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss , 2015, ArXiv.

[17]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[18]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[20]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[21]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.