Automatic Differentiation of Sketched Regression

Sketching for speeding up regression problems involves using a sketching matrix S to quickly find the approximate solution to a linear least squares regression (LLS) problem: given A of size n × d, with n d, along with b of size n × 1, we seek a vector y with minimal regression error ‖Ay − b‖2. This approximation technique is now standard in data science, and many software systems use sketched regression internally, as a component. It is often useful to calculate derivatives (gradients for the purpose of optimization, for example) of such large systems, where sketched LLS is merely a component of a larger system whose derivatives are needed. To support Automatic Differentiation (AD) of systems containing sketched LLS, we consider propagating derivatives through LLS: both propagating perturbations (forward AD) and gradients (reverse AD). AD performs accurate differentiation and is efficient for problems with a huge number of independent variables. Since we use LLSS (sketched LLS) instead of LLS for reasons of efficiency, propagation of derivatives also needs to trade accuracy for efficiency, presumably by sketching. There are two approaches for this: (a) use AD to transform the code that defines LLSS , or (b) approximate exact derivative propagation through LLS using sketching methods. We provide strong bounds on the errors produced due to these two natural forms of sketching in the context of AD, giving the first dimensionality reduction analysis for calculating the derivatives of a sketched computation. Our results crucially depend on a novel analysis of Proceedings of the 23International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. PMLR: Volume 108. Copyright 2020 by the author(s). the operator norm of a sketched inverse matrix product in this context. Extensive experiments on both synthetic and real-world experiments demonstrate the efficacy of our sketched gradients.

[1]  Uwe Naumann,et al.  The Art of Differentiating Computer Programs - An Introduction to Algorithmic Differentiation , 2012, Software, environments, tools.

[2]  E. Tziperman,et al.  Finite Difference of Adjoint or Adjoint of Finite Difference , 1997 .

[3]  Vladimir Braverman,et al.  Communication-efficient distributed SGD with Sketching , 2019, NeurIPS.

[4]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[5]  Louis B. Rall,et al.  Automatic Differentiation: Techniques and Applications , 1981, Lecture Notes in Computer Science.

[6]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[7]  B. Christianson Reverse accumulation and attractive fixed points , 1994 .

[8]  David P. Woodruff,et al.  Fast Regression with an $\ell_\infty$ Guarantee , 2017, ICALP.

[9]  R. E. Wengert,et al.  A simple automatic derivative evaluation program , 1964, Commun. ACM.

[10]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[11]  R. Penrose On best approximate solutions of linear matrix equations , 1956, Mathematical Proceedings of the Cambridge Philosophical Society.

[12]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[13]  B. Speelpenning Compiling Fast Partial Derivatives of Functions Given by Algorithms , 1980 .

[14]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[15]  Vipul Gupta,et al.  OverSketched Newton: Fast Convex Optimization for Serverless Systems , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[16]  David P. Woodruff,et al.  Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[17]  M. Giles Collected Matrix Derivative Results for Forward and Reverse Mode Algorithmic Differentiation , 2008 .

[18]  David P. Woodruff,et al.  Sublinear Time Numerical Linear Algebra for Structured Matrices , 2019, AAAI.

[19]  Jin Young Choi,et al.  Differentiable Forward and Backward Fixed-Point Iteration Layers , 2020, IEEE Access.

[20]  Atri Rudra,et al.  Learning Compressed Transforms with Low Displacement Rank , 2018, NeurIPS.

[21]  Georg Martius,et al.  Differentiation of Blackbox Combinatorial Solvers , 2020, ICLR.

[22]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).