Automatic differentiation in machine learning: a survey
暂无分享,去创建一个
Barak A. Pearlmutter | Jeffrey Mark Siskind | Atilim Gunes Baydin | Alexey Andreyevich Radul | J. Siskind | A. G. Baydin | Alexey Radul
[1] R. V. Gamkrelidze,et al. THE THEORY OF OPTIMAL PROCESSES. I. THE MAXIMUM PRINCIPLE , 1960 .
[2] A. E. Bryson,et al. A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .
[3] R. E. Wengert,et al. A simple automatic derivative evaluation program , 1964, Commun. ACM.
[4] Arthur E. Bryson,et al. Applied Optimal Control , 1969 .
[5] Ludovít Molnár,et al. Analytical differentiation on a digital computer , 1970, Kybernetika.
[6] J. Meditch,et al. Applied optimal control , 1972, IEEE Transactions on Automatic Control.
[7] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[8] F. L. Bauer. Computational Graphs and Rounding Error , 1974 .
[9] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[10] D. Peng,et al. A New Two-Constant Equation of State , 1976 .
[11] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .
[12] Berthold K. P. Horn. Understanding Image Intensities , 1977, Artif. Intell..
[13] George M. Siouris,et al. Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.
[14] D. J. Bell,et al. Numerical Methods for Unconstrained Optimization , 1979 .
[15] B. Speelpenning. Compiling Fast Partial Derivatives of Functions Given by Algorithms , 1980 .
[16] Bengt Fornberg,et al. Numerical Differentiation of Analytic Functions , 1981, TOMS.
[17] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.
[18] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[19] Geoffrey E. Hinton,et al. A general framework for parallel distributed processing , 1986 .
[20] William H. Press,et al. Numerical Recipes: The Art of Scientific Computing , 1987 .
[21] S. Duane,et al. Hybrid Monte Carlo , 1987 .
[22] F W Pfeiffer,et al. Automatic differentiation in prose , 1987, SGNM.
[23] W. Press,et al. Numerical Recipes: The Art of Scientific Computing , 1987 .
[24] M. Bertero,et al. Ill-posed problems in early vision , 1988, Proc. IEEE.
[25] Léon Bottou,et al. Sn: A simulator for connectionist models , 1988 .
[26] Griewank,et al. On automatic differentiation , 1988 .
[27] R. D. Neidinger. Automatic Differentiation and APL , 1989 .
[28] John Peterson. Untagged data in tagged environments: choosing optimal representations at compile time , 1989, FPCA.
[29] Andrew W. Appel,et al. Runtime tags aren't necessary , 1989, LISP Symb. Comput..
[30] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[31] Bernard Widrow,et al. 30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.
[32] Olin Shivers,et al. Control-flow analysis of higher-order languages of taming lambda , 1991 .
[33] Simon L. Peyton Jones,et al. Unboxed Values as First Class Citizens in a Non-Strict Functional Language , 1991, FPCA.
[34] David W. Juedes,et al. A taxonomy of automatic differentiation tools , 1991 .
[35] Claude Brezinski,et al. Extrapolation methods - theory and practice , 1993, Studies in computational mathematics.
[36] Lawrence C. Rich,et al. Automatic differentiation in MATLAB , 1992 .
[37] Peter Sestoft,et al. Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.
[38] Brian W. Kernighan,et al. AMPL: A Modeling Language for Mathematical Programming , 1993 .
[39] Philip Wadler,et al. The Glasgow Haskell Compiler: a technical overview , 1993 .
[40] B. Christianson. Reverse accumulation and attractive fixed points , 1994 .
[41] R. L. Hinkins,et al. Parallel computation of automatic differentiation applied to magnetic field calculations , 1994 .
[42] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[43] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[44] S. Chib,et al. Understanding the Metropolis-Hastings Algorithm , 1995 .
[45] Murray Hill. Automatically Finding and Exploiting Partially Separable Structure in Nonlinear Programming Problems , 1996 .
[46] Yann LeCun,et al. Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.
[47] Christian Bischof,et al. Adifor 2.0: automatic differentiation of Fortran 77 programs , 1996 .
[48] C. Bendtsen. FADBAD, a flexible C++ package for automatic differentiation - using the forward and backward method , 1996 .
[49] M. Berz,et al. COSY INFINITY and Its Applications in Nonlinear Dynamics , 1996 .
[50] D. Gay. Automatically Finding and Exploiting Partially Separable Structure in Nonlinear Programming Problems , 1996 .
[51] C. Bert,et al. Differential Quadrature Method in Computational Mechanics: A Review , 1996 .
[52] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.
[53] E. Tziperman,et al. Finite Difference of Adjoint or Adjoint of Finite Difference , 1997 .
[54] Christian Bischof,et al. ADIC: an extensible automatic differentiation tool for ANSI-C , 1997 .
[55] F. Potra,et al. Sensitivity analysis for atmospheric chemistry models via automatic differentiation , 1997 .
[56] M. Jerrell. Automatic Differentiation and Interval Arithmetic for Estimation of Disequilibrium Models , 1997 .
[57] Geoffrey E. Hinton,et al. Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[58] Christian H. Bischof,et al. ADIC: an extensible automatic differentiation tool for ANSI‐C , 1997, Softw. Pract. Exp..
[59] Xavier Leroy,et al. The effectiveness of type-based unboxing , 1997 .
[60] Siegfried M. Rump,et al. INTLAB - INTerval LABoratory , 1998, SCAN.
[61] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[62] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[63] Thomas Kaminski,et al. Recipes for adjoint code construction , 1998, TOMS.
[64] P. Wedin,et al. Regularization tools for training large feed-forward neural networks using automatic differentiation ∗ , 1998 .
[65] Andrew W. Fitzgibbon,et al. Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.
[66] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .
[67] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[68] A. Chambolle,et al. Inverse problems in image processing and image segmentation : some mathematical and numerical aspects , 2000 .
[69] I. Charpentier,et al. Efficient adjoint derivatives: application to the meteorological model meso-nh , 2000 .
[70] Bruce Christianson,et al. Application of automatic diffentiation to race car performance optimisation , 2000 .
[71] G. Haase,et al. Optimal sizing of industrial structural mechanics problems using AD , 2000 .
[72] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.
[73] S. Forth,et al. Aerofoil optimisation via AD of a multigrid cell-vertex Euler flow solver , 2000 .
[74] Gerald J. Sussman,et al. Structure and interpretation of classical mechanics , 2001 .
[75] H. Martin Bücker,et al. Automatic differentiation for computational finance , 2002 .
[76] Scott Tremaine,et al. Structure and Interpretation of Classical Mechanics , 2002 .
[77] Christian H. Bischof,et al. Implementation of automatic differentiation tools , 2002, PEPM '02.
[78] Erich Kaltofen,et al. Computer algebra handbook , 2002 .
[79] Nicol N. Schraudolph,et al. Combining Conjugate Direction Methods with Stochastic Approximation of Gradients , 2003, AISTATS.
[80] Alain Dervieux,et al. Automatic Differentiation for Optimum Design, Applied to Sonic Boom Reduction , 2003, ICCSA.
[81] Andreas Griewank,et al. Introduction to Automatic Differentiation , 2003 .
[82] Andreas Griewank,et al. A mathematical view of automatic differentiation , 2003, Acta Numerica.
[83] Uwe Naumann,et al. Optimal accumulation of Jacobian matrices by elimination methods on the dual computational graph , 2004, Math. Program..
[84] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[85] Jens-Dominik Müller,et al. On the performance of discrete adjoint CFD codes using automatic differentiation , 2005 .
[86] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[87] M. Sambridge,et al. Automatic differentiation in geophysical inverse problems , 2005 .
[88] X. Yi. On Automatic Differentiation , 2005 .
[89] Barak A. Pearlmutter,et al. Perturbation Confusion and Referential Transparency:Correct Functional Implementation of Forward-Mode AD , 2005 .
[90] Shaun A. Forth. An efficient overloaded implementation of forward mode automatic differentiation in MATLAB , 2006, TOMS.
[91] Laurent Hascoët,et al. The Data-Flow Equations of Checkpointing in Reverse Automatic Differentiation , 2006, International Conference on Computational Science.
[92] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.
[93] Louis B. Rall,et al. Perspectives on Automatic Differentiation: Past, Present, and Future? , 2006 .
[94] Uwe Naumann,et al. Computing Adjoints with the NAGWare Fortran 95 Compiler , 2006 .
[95] E. Dowell,et al. Using Automatic Differentiation to Create a Nonlinear Reduced Order Model of a Computational Fluid Dynamic Solver , 2006 .
[96] J.-F. Ostiguy,et al. Mxyzptlk: An efficient, native C++ differentiation engine , 2007, 2007 IEEE Particle Accelerator Conference (PAC).
[97] Andrea Walther,et al. Automatic differentiation of explicit Runge-Kutta methods for optimal control , 2007, Comput. Optim. Appl..
[98] Zhenzhen Liu,et al. Fast and Scalable Recurrent Neural Network Learning based on Stochastic Meta-Descent , 2007, 2007 American Control Conference.
[99] Horst Bischof,et al. Algorithmic Differentiation: Application to Variational Problems in Computer Vision , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[100] Emil Slusanschi,et al. Automatic Differentiation of the General-Purpose Computational Fluid Dynamics Package FLUENT , 2007 .
[101] Li Yan,et al. Application of PID Controller Based on BP Neural Network Using Automatic Differentiation Method , 2008, ISNN.
[102] Barak A. Pearlmutter,et al. Using Polyvariant Union-Free Flow Analysis to Compile aHigher-Order Functional-Programming Language with aFirst-Class Derivative Operator to Efficient Fortran-like Code , 2008 .
[103] Laurent Hascoët,et al. TAPENADE for C , 2008 .
[104] Christian H. Bischof,et al. On the implementation of automatic differentiation tools , 2002, PEPM '02.
[105] James V. Burke,et al. Algorithmic Differentiation of Implicit Functions and Optimal Values , 2008 .
[106] Barak A. Pearlmutter,et al. Nesting forward-mode AD in a functional framework , 2008, High. Order Symb. Comput..
[107] Yi Cao,et al. Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation , 2008 .
[108] Barak A. Pearlmutter,et al. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator , 2008, TOPL.
[109] Bernhard Kainz,et al. Automatic Differentiation for GPU-Accelerated 2D/3D Registration , 2008 .
[110] Christopher D. Manning,et al. Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.
[111] Johannes Willkomm,et al. Introduction to Automatic Differentiation , 2009 .
[112] Jonathan Cohen,et al. Title: A Fast Double Precision CFD Code using CUDA , 2009 .
[113] Andrea Walther,et al. Efficient Computation of Sparse Hessians Using Coloring and Automatic Differentiation , 2009, INFORMS J. Comput..
[114] D. G. Sotiropoulos,et al. A memoryless BFGS neural network training algorithm , 2009, 2009 7th IEEE International Conference on Industrial Informatics.
[115] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[116] Andrea Walther,et al. Getting Started with ADOL-C , 2009, Combinatorial Scientific Computing.
[117] L. Capriotti. Fast Greeks by Algorithmic Differentiation , 2010 .
[118] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[119] Johannes Willkomm,et al. Automatic Differentiation for Matlab , 2010 .
[120] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[121] Noah A. Smith,et al. Distributed Asynchronous Online Learning for Natural Language Processing , 2010, CoNLL.
[122] Kenneth Ruud,et al. Arbitrary-Order Density Functional Response Theory from Automatic Differentiation. , 2010, Journal of chemical theory and computation.
[123] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[124] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[125] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .
[126] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[127] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[128] Noah D. Goodman,et al. Nonstandard Interpretations of Probabilistic Programs for Efficient Inference , 2011, NIPS.
[129] Andreas Griewank,et al. On the numerical stability of algorithmic differentiation , 2012, Computing.
[130] M. Girolami,et al. Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[131] Bruce Christianson. A Leibniz Notation for Automatic Differentiation , 2012 .
[132] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[133] Andreas Griewank,et al. Who Invented the Reverse Mode of Differentiation , 2012 .
[134] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[135] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[136] Alex Pothen,et al. ColPack: Software for graph coloring and related problems in scientific computing , 2013, TOMS.
[137] Jeffrey Mark Siskind,et al. Felzenszwalb-Baum-Welch: Event Detection by Changing Appearance , 2013, ArXiv.
[138] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[139] Daniel Cohen-Or,et al. Geosemantic Snapping for Sketch‐Based Modeling , 2013, Comput. Graph. Forum.
[140] Noah D. Goodman. The principles and practice of probabilistic programming , 2013, POPL.
[141] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[142] Laurent Hascoët,et al. The Tapenade automatic differentiation tool: Principles, model, and specification , 2013, TOMS.
[143] Noah D. Goodman,et al. Learning Stochastic Inverses , 2013, NIPS.
[144] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[145] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[146] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[147] Thomas A. Henzinger,et al. Probabilistic programming , 2014, FOSE.
[148] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[149] Varun Ramakrishna,et al. User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[150] Noah D. Goodman,et al. Amortized Inference in Probabilistic Reasoning , 2014, CogSci.
[151] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.
[152] Michael J. Black,et al. OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.
[153] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[154] Andrew Gelman,et al. The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..
[155] T. van Amelsvoort. Bridging the Gap , 2014, Tijdschrift voor psychiatrie.
[156] Ilker Yildirim. Efficient and robust analysis-by-synthesis in vision : A computational framework , behavioral tests , and modeling neuronal representations , 2015 .
[157] Bob Carpenter,et al. The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ , 2015, ArXiv.
[158] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[159] Christopher R'e,et al. Caffe con Troll: Shallow Ideas to Speed Up Deep Learning , 2015, DanaC@SIGMOD.
[160] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[161] Joshua B. Tenenbaum,et al. Efficient analysis-by-synthesis in vision: A computational framework, behavioral tests, and modeling neuronal representations , 2015, Annual Meeting of the Cognitive Science Society.
[162] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[163] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.
[164] Emanuel Todorov,et al. Graphical Newton , 2015, ArXiv.
[165] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[166] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[167] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[168] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[169] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.
[170] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[171] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.
[172] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[173] Joshua B. Tenenbaum,et al. Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[174] David Eichelberger. Computer Algebra Handbook Foundations Applications Systems , 2016 .
[175] Barak A. Pearlmutter,et al. Efficient Implementation of a Higher-Order Language with Built-In AD , 2016, ArXiv.
[176] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[177] Emil Slusanschi,et al. ADiJaC -- Automatic Differentiation of Java Classfiles , 2016, ACM Trans. Math. Softw..
[178] Geoffrey E. Hinton,et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.
[179] Barak A. Pearlmutter,et al. Tricks from Deep Learning , 2016, ArXiv.
[180] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[181] Wojciech Zaremba,et al. Learning Simple Algorithms from Examples , 2015, ICML.
[182] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[183] Yoav Goldberg,et al. A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..
[184] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[185] Zuzana Kukelova,et al. A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Machine Learning and Computer Vision , 2016 .
[186] Miles Lubin,et al. Forward-Mode Automatic Differentiation in Julia , 2016, ArXiv.
[187] John Salvatier,et al. Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..
[188] Barak A. Pearlmutter,et al. DiffSharp: An AD Library for .NET Languages , 2016, ArXiv.
[189] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[190] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.
[191] Noah D. Goodman,et al. Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.
[192] Dustin Tran,et al. Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.
[193] Dougal Maclaurin,et al. Modeling, Inference and Optimization With Composable Differentiable Procedures , 2016 .
[194] Naman Agarwal,et al. Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.
[195] Alex Graves,et al. Memory-Efficient Backpropagation Through Time , 2016, NIPS.
[196] Yu Hai-na,et al. Application of PID Controller Based on BP Neural Network in Temperature Control of Aquaculture Greenhouse , 2016 .
[197] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[198] Frank D. Wood,et al. Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.
[199] Philipp Koehn,et al. Neural Machine Translation , 2017, ArXiv.
[200] Dustin Tran,et al. Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..
[201] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[202] Dustin Tran,et al. Deep Probabilistic Programming , 2017, ICLR.
[203] Jiqiang Guo,et al. Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.
[204] Jascha Sohl-Dickstein,et al. REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.
[205] Dan Moldovan,et al. Tangent: Automatic Differentiation Using Source Code Transformation in Python , 2017, ArXiv.
[206] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[207] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[208] Frank D. Wood,et al. Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.
[209] Naman Agarwal,et al. Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..
[210] Barak A. Pearlmutter,et al. Divide-and-conquer checkpointing for arbitrary programs with no user annotation , 2017, Optim. Methods Softw..
[211] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[212] David Duvenaud,et al. Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.
[213] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[214] Barak A. Pearlmutter,et al. Perturbation confusion in forward automatic differentiation of higher-order functions , 2012, Journal of Functional Programming.
[215] Enate,et al. Stochastic volatility: Bayesian computation using automatic differentiation and the extended Kalman filter , 2003 .