Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets

Abstract. Regional rainfall–runoff modeling is an old but still mostly outstanding problem in the hydrological sciences. The problem currently is that traditional hydrological models degrade significantly in performance when calibrated for multiple basins together instead of for a single basin alone. In this paper, we propose a novel, data-driven approach using Long Short-Term Memory networks (LSTMs) and demonstrate that under a “big data” paradigm, this is not necessarily the case. By training a single LSTM model on 531 basins from the CAMELS dataset using meteorological time series data and static catchment attributes, we were able to significantly improve performance compared to a set of several different hydrological benchmark models. Our proposed approach not only significantly outperforms hydrological models that were calibrated regionally, but also achieves better performance than hydrological models that were calibrated for each basin individually. Furthermore, we propose an adaption to the standard LSTM architecture, which we call an Entity-Aware-LSTM (EA-LSTM), that allows for learning catchment similarities as a feature layer in a deep learning model. We show that these learned catchment similarities correspond well to what we would expect from prior hydrological understanding.

[1]  Luis Samaniego,et al.  Scaling, similarity, and the fourth paradigm for hydrology , 2017 .

[2]  Martyn P. Clark,et al.  Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models , 2008 .

[3]  G. Blöschl,et al.  Runoff prediction in ungauged basins: Synthesis across processes, places and scales , 2013 .

[4]  P. E. O'connell,et al.  IAHS Decade on Predictions in Ungauged Basins (PUB), 2003–2012: Shaping an exciting future for the hydrological sciences , 2003 .

[5]  Hoshin Vijai Gupta,et al.  A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model , 2008 .

[6]  Luis Samaniego,et al.  Towards seamless large‐domain parameter estimation for hydrologic models , 2017 .

[7]  Sepp Hochreiter,et al.  Do internals of neural networks make sense in the context of hydrology , 2018 .

[8]  Max D. Morris,et al.  Factorial sampling plans for preliminary computational experiments , 1991 .

[9]  Tim R. McVicar,et al.  Global‐scale regionalization of hydrologic model parameters , 2016 .

[10]  Martyn P. Clark,et al.  Development of a large-sample watershed-scale hydrometeorological data set for the contiguous USA: data set characteristics and assessment of regional variability in hydrologic model performance , 2014 .

[11]  Soroosh Sorooshian,et al.  Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information , 1998 .

[12]  D. Lettenmaier,et al.  A simple hydrologically based model of land surface water and energy fluxes for general circulation models , 1994 .

[13]  Dmitri Kavetski,et al.  Estimating mountain basin‐mean precipitation from streamflow using Bayesian inference , 2015 .

[14]  Jan Seibert,et al.  Teaching hydrological modeling with a user-friendly catchment-runoff-model software package , 2012 .

[15]  Martyn P. Clark,et al.  The CAMELS data set: catchment attributes and meteorology for large-sample studies , 2017 .

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Martyn P. Clark,et al.  On the choice of calibration metrics for “high-flow” estimation using hydrologic models , 2019, Hydrology and Earth System Sciences.

[18]  Sepp Hochreiter,et al.  NeuralHydrology - Interpreting LSTMs in Hydrology , 2019, Explainable AI.

[19]  Dmitri Kavetski,et al.  Flow Prediction in Ungauged Catchments Using Probabilistic Random Forests Regionalization and New Statistical Adequacy Tests , 2019, Water Resources Research.

[20]  Yuqiong Liu,et al.  Reconciling theory with observations: elements of a diagnostic approach to model evaluation , 2008 .

[21]  Donna M. Rizzo,et al.  Advances in ungauged streamflow prediction using artificial neural networks , 2010 .

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Jan Seibert,et al.  Regionalisation of parameters for a conceptual rainfall-runoff model , 1999 .

[24]  M. P. Clark,et al.  A Ranking of Hydrological Signatures Based on Their Predictability in Space , 2018, Water Resources Research.

[25]  Paulin Coulibaly,et al.  Streamflow Prediction in Ungauged Basins: Review of Regionalization Methods , 2013 .

[26]  J. McDonnell,et al.  A decade of Predictions in Ungauged Basins (PUB)—a review , 2013 .

[27]  Hoshin Vijai Gupta,et al.  Large-sample hydrology: a need to balance depth with breadth , 2013 .

[28]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[29]  Murugesu Sivapalan,et al.  Scale issues in hydrological modelling: A review , 1995 .

[30]  A. Jakeman,et al.  How much complexity is warranted in a rainfall‐runoff model? , 1993 .

[31]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[32]  Hoshin Vijai Gupta,et al.  Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling , 2009 .

[33]  Jan Seibert,et al.  Upper and lower benchmarks in hydrological modelling , 2018 .

[34]  F. Naef,et al.  Can we model the rainfall-runoff process today? , 1981 .

[35]  Karsten Schulz,et al.  Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks , 2018, Hydrology and Earth System Sciences.

[36]  J. Kirchner Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology , 2006 .

[37]  Stefano Tarantola,et al.  Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models , 2004 .

[38]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[39]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[40]  C. Perrin,et al.  Does a large number of parameters enhance model performance? Comparative assessment of common catchment model structures on 429 catchments , 2001 .

[41]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[42]  Martyn P. Clark,et al.  Benchmarking of a Physically Based Hydrologic Model , 2017 .

[43]  Guido Van Rossum,et al.  Python Tutorial , 1999 .

[44]  Keith Beven,et al.  Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems using the GLUE methodology , 2001 .

[45]  Anqi Wang,et al.  Practical Experience of Sensitivity Analysis: Comparing Six Methods, on Three Hydrological Models, with Three Performance Criteria , 2019, Water.

[46]  S. Attinger,et al.  Multiscale parameter regionalization of a grid‐based hydrologic model at the mesoscale , 2010 .

[47]  Louis B. Rall,et al.  Automatic differentiation , 1981 .

[48]  Sabine Attinger,et al.  Implications of distributed hydrologic model parameterization on water fluxes at multiple scales and locations , 2013 .

[49]  Beck Hylke,et al.  Global-scale regionalization of hydrologic model parameters , 2016 .

[50]  Luis Samaniego,et al.  Diagnostic Evaluation of Large‐Domain Hydrologic Models Calibrated Across the Contiguous United States , 2019, Journal of Geophysical Research: Atmospheres.

[51]  Arun Kumar,et al.  Long‐range experimental hydrologic forecasting for the eastern United States , 2002 .

[52]  Patrick M. Reed,et al.  Technical Note: Method of Morris effectively reduces the computational demands of global sensitivity analysis for distributed watershed models , 2013 .

[53]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.