MC-LSTM: Mass-Conserving LSTM

The success of Convolutional Neural Networks (CNNs) in computer vision is mainly driven by their strong inductive bias, which is strong enough to allow CNNs to solve vision-related tasks with random weights, meaning without learning. Similarly, Long Short-Term Memory (LSTM) has a strong inductive bias toward storing information over time. However, many real-world systems are governed by conservation laws, which lead to the redistribution of particular quantities — e.g. in physical and economical systems. Our novel Mass-Conserving LSTM (MC-LSTM) adheres to these conservation laws by extending the inductive bias of LSTM to model the redistribution of those stored quantities. MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modeling a damped pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. In the hydrology example, we show that MC-LSTM states correlate with real world processes and are therefore interpretable.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[4]  Amnon Shashua,et al.  Inductive Bias of Deep Convolutional Networks through Pooling Geometry , 2016, ICLR.

[5]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[7]  Gonçalo Simões,et al.  Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.

[8]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[9]  Eric A. Anderson,et al.  National Weather Service river forecast system: snow accumulation and ablation model , 1973 .

[10]  Roger Wattenhofer,et al.  Neural Status Registers , 2020, ArXiv.

[11]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[12]  Karsten Schulz,et al.  Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks , 2018, Hydrology and Earth System Sciences.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Charles Bordenave,et al.  Circular law theorem for random Markov matrices , 2008, 0808.1502.

[15]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[16]  M. Dalsmo,et al.  Mathematical structures in the network representation of energy-conserving physical systems , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[17]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[18]  Yan Wang,et al.  A Powerful Generative Model Using Random Weights for the Deep Image Representation , 2016, NIPS.

[19]  Hoshin Vijai Gupta,et al.  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling , 2009 .

[20]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Renato Renner,et al.  Discovering physical concepts with neural networks , 2018, Physical review letters.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Anuj Karpatne,et al.  Physics Guided RNNs for Modeling Dynamical Systems: A Case Study in Simulating Lake Temperature Profiles , 2018, SDM.

[25]  Yang Liu,et al.  DeepPF: A deep learning based architecture for metro passenger flow prediction , 2019, Transportation Research Part C: Emerging Technologies.

[26]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[27]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[28]  Zhifeng Bao,et al.  A Survey on Modern Deep Neural Network for Traffic Prediction: Trends, Methods and Challenges , 2020, IEEE Transactions on Knowledge and Data Engineering.

[29]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[30]  Keith Beven,et al.  Deep learning, hydrological processes and the uniqueness of place , 2020, Hydrological Processes.

[31]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[32]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[33]  R. Gu,et al.  Modeling Flow and Sediment Transport in a River System Using an Artificial Neural Network , 2003, Environmental management.

[34]  Jianyu Zhang,et al.  Symplectic Recurrent Neural Networks , 2020, ICLR.

[35]  Luc Rey-Bellet,et al.  Ergodic properties of Markov processes , 2006 .

[36]  Hoshin Vijai Gupta,et al.  A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model , 2008 .

[37]  Rui Xu,et al.  Discovering Symbolic Models from Deep Learning with Inductive Biases , 2020, NeurIPS.

[38]  C R Gallistel,et al.  Finding numbers in the brain , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  V. Šmídl,et al.  Neural Power Units , 2020, NeurIPS.

[40]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[41]  S. L. Sellars,et al.  “Grand Challenges” in Big Data and the Earth Sciences , 2018, Bulletin of the American Meteorological Society.

[42]  Martyn P. Clark,et al.  Benchmarking of a Physically Based Hydrologic Model , 2017 .

[43]  Donald R. Drew,et al.  TRAFFIC DYNAMICS: METHOD FOR ESTIMATING FREEWAY TRAVEL TIMES IN REAL TIME FROM FLOW MEASUREMENTS , 1996 .

[44]  M. Clark,et al.  A philosophical basis for hydrological uncertainty , 2016 .

[45]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[46]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[47]  S. Dehaene,et al.  The Number Sense: How the Mind Creates Mathematics. , 1998 .

[48]  Isabelle Augenstein,et al.  Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM , 2017, *SEMEVAL.

[49]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[50]  R. Freeze,et al.  Blueprint for a physically-based, digitally-simulated hydrologic response model , 1969 .

[51]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[52]  Pierre Baldi,et al.  Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems. , 2019, Physical review letters.

[53]  Chris Dyer,et al.  Neural Arithmetic Logic Units , 2018, NeurIPS.

[54]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[55]  Martyn P. Clark,et al.  On the choice of calibration metrics for high flow estimation using hydrologic models , 2018 .

[56]  Jan Seibert,et al.  Upper and lower benchmarks in hydrological modelling , 2018 .

[57]  Pierre Gentine,et al.  Achieving Conservation of Energy in Neural Network Emulators for Climate Modeling , 2019, ArXiv.

[58]  Philip K. Hopke,et al.  Solving the Chemical Mass Balance Problem Using an Artificial Neural Network , 1996 .

[59]  Bodo Rosenhahn,et al.  Markov Chain Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[62]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[63]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[64]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[65]  Yinhai Wang,et al.  Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting , 2018, IEEE Transactions on Intelligent Transportation Systems.

[66]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[67]  Gang Liu,et al.  Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[68]  M. Evans,et al.  Nonequilibrium statistical mechanics of the zero-range process and related models , 2005, cond-mat/0501338.

[69]  L. Hay,et al.  Hydrometeorological dataset for the contiguous USA , 2014 .

[70]  A. Nieder The neuronal code for number , 2016, Nature Reviews Neuroscience.

[71]  Xinping Xiao,et al.  A new grey model for traffic flow mechanics , 2020, Eng. Appl. Artif. Intell..

[72]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[73]  Adam Gaier,et al.  Weight Agnostic Neural Networks , 2019, NeurIPS.

[74]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[75]  Lelitha Vanajakshi,et al.  Loop Detector Data Diagnostics Based on Conservation-of-Vehicles Principle , 2004 .

[76]  Luis Samaniego,et al.  Towards seamless large‐domain parameter estimation for hydrologic models , 2017 .

[77]  K. Beven Rainfall-Runoff Modelling: The Primer , 2012 .

[78]  Martyn P. Clark,et al.  The CAMELS data set: catchment attributes and meteorology for large-sample studies , 2017 .

[79]  Sepp Hochreiter,et al.  A note on leveraging synergy in multiple meteorological datasets with deep learning for rainfall-runoff modeling , 2020 .

[80]  Luis Samaniego,et al.  Diagnostic Evaluation of Large‐Domain Hydrologic Models Calibrated Across the Contiguous United States , 2019, Journal of Geophysical Research: Atmospheres.

[81]  Sepp Hochreiter,et al.  NeuralHydrology - Interpreting LSTMs in Hydrology , 2019, Explainable AI.

[82]  E. Todini Rainfall-runoff modeling — Past, present and future , 1988 .

[83]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[84]  Martyn P. Clark,et al.  Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance , 2014 .

[85]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[86]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[87]  Alexander Rosenberg Johansen,et al.  Neural Arithmetic Units , 2020, ICLR.

[88]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[89]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[90]  Peter C. Y. Chen,et al.  LSTM network: a deep learning approach for short-term traffic forecast , 2017 .

[91]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[92]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[93]  Raia Hadsell,et al.  The surprising efficiency of framing geo-spatial time series forecasting as a video prediction task - Insights from the IARAI Traffic4cast Competition at NeurIPS 2019 , 2020, NeurIPS.

[94]  S. Hochreiter,et al.  Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning , 2019, Water Resources Research.

[95]  H. Rabitz,et al.  Efficient input-output model representations , 1999 .

[96]  Gustavo Deco,et al.  Nonlinear higher-order statistical decorrelation by volume-conserving neural architectures , 1995, Neural Networks.