Meta-Solver for Neural Ordinary Differential Equations

A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network’s weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we investigate how the variability in solvers’ space can improve neural ODEs performance. We consider a family of Runge-Kutta methods that are parameterized by no more than two scalar variables. Based on the solvers’ properties, we propose an approach to decrease neural ODEs overfitting to the pre-defined solver, along with a criterion to evaluate such behaviour. Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks. Recently it was shown that neural ODEs demonstrate superiority over conventional CNNs in terms of robustness. Our work demonstrates that the model robustness can be further improved by optimizing solver choice for a given task. The source code to reproduce our experiments is available at https://github.com/juliagusak/neural-ode-metasolver.

[1]  Timothy M. Hospedales,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Matematik,et al.  Numerical Methods for Ordinary Differential Equations: Butcher/Numerical Methods , 2005 .

[3]  Michael W. Mahoney,et al.  Continuous-in-Depth Neural Networks , 2020, ArXiv.

[4]  Philipp Hennig,et al.  When are Neural ODE Solutions Proper ODEs? , 2020, ArXiv.

[5]  Matthew J. Johnson,et al.  Learning Differential Equations that are Easy to Solve , 2020, NeurIPS.

[6]  Eric Z. Chen,et al.  MRI Image Reconstruction via Learning Optimization Using Neural ODEs , 2020, MICCAI.

[7]  Philip H. S. Torr,et al.  STEER : Simple Temporal Regularization For Neural ODEs , 2020, ArXiv.

[8]  J. Duncan,et al.  Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE , 2020, ICML.

[9]  Lars Ruthotto,et al.  Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows , 2020, ArXiv.

[10]  Alexandr Katrutsa,et al.  Towards Understanding Normalization in Neural ODEs , 2020, ICLR 2020.

[11]  Asha Anoosheh,et al.  Real-time Classification from Short Event-Camera Streams using Input-filtering Neural ODEs , 2020, ArXiv.

[12]  Alexandr Katrutsa,et al.  Interpolated Adjoint Method for Neural ODEs , 2020, ArXiv.

[13]  D. Vetrov,et al.  Stochasticity in Neural ODEs: An Empirical Study , 2020, ICLR 2020.

[14]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[15]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[16]  Jiashi Feng,et al.  On Robustness of Neural Ordinary Differential Equations , 2019, ICLR.

[17]  Roberto Caldelli,et al.  On the Robustness to Adversarial Examples of Neural ODE Image Classifiers , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).

[18]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Cho-Jui Hsieh,et al.  Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise , 2019, ArXiv.

[20]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[21]  Peisong Wang,et al.  ODE-Inspired Network Design for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[23]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[24]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.

[25]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[26]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[27]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[28]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.

[29]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[30]  G. Karniadakis,et al.  Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems , 2018, 1801.01236.

[31]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[32]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[33]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[34]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[35]  Dahua Lin,et al.  PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[37]  Eric A Sobie,et al.  An Introduction to Dynamical Systems , 2011, Science Signaling.

[38]  E. Hairer,et al.  Solving Ordinary Differential Equations II , 2010 .

[39]  J. Lambert Numerical Methods for Ordinary Differential Equations , 1991 .

[40]  J. Dormand,et al.  A family of embedded Runge-Kutta formulae , 1980 .