DiffSharp: Automatic Differentiation Library

In this paper we introduce DiffSharp, an automatic differentiation (AD) library designed with machine learning in mind. AD is a family of techniques that evaluate derivatives at machine precision with only a small constant factor of overhead, by systematically applying the chain rule of calculus at the elementary operator level. DiffSharp aims to make an extensive array of AD techniques available, in convenient form, to the machine learning community. These including arbitrary nesting of forward/reverse AD operations, AD with linear algebra primitives, and a functional API that emphasizes the use of higher-order functions and composition. The library exposes this functionality through an API that provides gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products. Bearing the performance requirements of the latest machine learning techniques in mind, the underlying computations are run through a high-performance BLAS/LAPACK backend, using OpenBLAS by default. GPU support is currently being implemented.

[1]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[3]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[4]  Andreas Griewank,et al.  Automatic Differentiation of Algorithms: From Simulation to Optimization , 2000, Springer New York.

[5]  P. Werbos Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[6]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[7]  Don Syme Leveraging .NET meta-programming components from F#: integrated queries and interoperable heterogeneous execution , 2006, ML '06.

[8]  Barak A. Pearlmutter,et al.  Perturbation Confusion and Referential Transparency:Correct Functional Implementation of Forward-Mode AD , 2005 .

[9]  Andrea Walther On the Efficient Computation of Sparsity Patterns for Hessians , 2012 .

[10]  Ebadollah Varnik,et al.  Exploitation of structural sparsity in algorithmic differentiation , 2011 .

[11]  Barak A. Pearlmutter,et al.  Nesting forward-mode AD in a functional framework , 2008, High. Order Symb. Comput..

[12]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[13]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[14]  Andreas Griewank,et al.  Who Invented the Reverse Mode of Differentiation , 2012 .

[15]  Barak A. Pearlmutter,et al.  Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator , 2008, TOPL.

[16]  Qian Wang,et al.  AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.