Optimality guarantees for distributed statistical estimation

Large data sets often require performing distributed statistical estimation, with a full data set split across multiple machines and limited communication between machines. To study such scenarios, we define and study some refinements of the classical minimax risk that apply to distributed settings, comparing to the performance of estimators with access to the entire data. Lower bounds on these quantities provide a precise characterization of the minimum amount of communication required to achieve the centralized minimax risk. We study two classes of distributed protocols: one in which machines send messages independently over channels without feedback, and a second allowing for interactive communication, in which a central server broadcasts the messages from a given machine to all other machines. We establish lower bounds for a variety of problems, including location estimation in several families and parameter estimation in different types of regression models. Our results include a novel class of quantitative data-processing inequalities used to characterize the effects of limited communication.

[1]  John N. Tsitsiklis,et al.  Communication complexity of convex optimization , 1986, 1986 25th IEEE Conference on Decision and Control.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  R. Z. Khasʹminskiĭ,et al.  Statistical estimation : asymptotic theory , 1981 .

[4]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[5]  Samuel H. Fuller,et al.  The Future of Computing Performance: Game Over or Next Level? , 2014 .

[6]  Abbas El Gamal,et al.  Network Information Theory , 2021, 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT).

[7]  Sergio VerdÂ,et al.  Statistical Inference Under Multiterminal Data Compression , 2000 .

[8]  J. Tsitsiklis Decentralized Detection' , 1993 .

[9]  Andrew Chi-Chih Yao,et al.  Informational complexity and the direct sum problem for simultaneous message complexity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[10]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[11]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.

[12]  Telecommunications Board The Future of Computing Performance: Game Over or Next Level? , 2011 .

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[16]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[17]  John N. Tsitsiklis,et al.  Convergence Speed in Distributed Consensus and Averaging , 2009, SIAM J. Control. Optim..

[18]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[19]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[20]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[21]  Harold Abelson,et al.  Lower bounds on information transfer in distributed computations , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[22]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[23]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[24]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[25]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[26]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[27]  Emmanuel J. Candès,et al.  On the Fundamental Limits of Adaptive Sensing , 2011, IEEE Transactions on Information Theory.

[28]  E. Kushilevitz,et al.  Communication Complexity: Basics , 1996 .

[29]  Xi Chen,et al.  How to Compress Interactive Communication , 2013, SIAM J. Comput..

[30]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[31]  Martin J. Wainwright,et al.  Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates , 2013, 1302.3203.

[32]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[33]  John N. Tsitsiklis,et al.  On the communication complexity of distributed algebraic computation , 1993, JACM.

[34]  John N. Tsitsiklis,et al.  Data fusion with minimal communication , 1994, IEEE Trans. Inf. Theory.

[35]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[36]  Gideon S. Mann,et al.  Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.

[37]  Emmanuel J. Candès,et al.  How well can we estimate a sparse vector? , 2011, ArXiv.

[38]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[39]  Ravi Kumar,et al.  An information statistics approach to data stream and communication complexity , 2004, J. Comput. Syst. Sci..

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Martin J. Wainwright,et al.  Distance-based and continuum Fano inequalities with applications to statistical estimation , 2013, ArXiv.

[42]  Zhi-Quan Luo,et al.  Universal decentralized estimation in a bandwidth constrained sensor network , 2005, IEEE Transactions on Information Theory.