A Machine Learning Approach to Pattern Detection and Prediction for Environmental Monitoring and Water Sustainability

We describe one of the successful products of a research partnership among several academic institutions (CMU, Oxford and UBC) and a water monitoring company (Aquatic Informatics). Water monitoring sensors are very diverse and remotely distributed. They produce vast quantities of data. The data itself is nonlinear and nonstationary. In addition, unanticipated environmental conditions and limitations in the sensing and communications hardware cause the data to be corrupted by previously uncharacterized nonlinearities, missing observations, spikes and multiple discontinuities. To improve the quality of the data and the monitoring process, this paper introduces an approach that uses Gaussian processes and a general “fault bucket” to capture a priori uncharacterized faults, along with an approximate method for marginalizing the potential faultiness of all observations. This gives rise to an efficient, flexible algorithm for the detection and automatic correction of faults. The probabilistic nature of the method is ideal for reporting uncertainty estimates to human operators. The approach can also be applied to detect patterns, other than faults, which are of great environmental significance. We present a fish sustainability example, where specific patterns in water level need to be detected so that fish don’t get trapped and die in shallow pools. Appearing in the ICML 2011 Workshop on Machine Learning for Global Challenges, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s).

[1]  Steven X. Ding,et al.  Model-based Fault Diagnosis Techniques: Design Schemes, Algorithms, and Tools , 2008 .

[2]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[3]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[4]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[5]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Fault Detection , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[6]  Shehroz S. Khan,et al.  A Survey of Recent Trends in One Class Classification , 2009, AICS.

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Andrew H. Gee,et al.  Dynamic Learning with the EM Algorithm for Neural Networks , 2000, J. VLSI Signal Process..

[9]  Thomas G. Dietterich,et al.  Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns , 2011, TOSN.

[10]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[11]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[12]  Rolf Isermann,et al.  Model-based fault-detection and diagnosis - status and applications , 2004, Annu. Rev. Control..

[13]  Andrew H. Gee,et al.  Hierarchical Bayesian Models for Regularization in Sequential Learning , 2000, Neural Computation.