Pipe failure prediction: A data mining method

Pipe breaks in urban water distribution network lead to significant economical and social costs, putting the service quality as well as the profit of water utilities at risk. To cope with such a situation, scheduled preventive maintenance is desired, which aims to predict and fix potential break pipes proactively. Physical models developed for understanding and predicting the failure of pipes are usually expensive, thus can only be used on a limited number of trunk pipes. As an alternative, statistical models that try to predict pipe breaks based on historical data are far less expensive, and therefore have attracted a lot of interests from water utilities recently. In this paper, we report a novel data mining prediction system that has been built for a water utility in a big Chinese city. Various aspects of how to build such a system are described, including problem formulation, data cleaning, model construction, as well as evaluating the importance of attributes according to the requirements of end users in water utilities. Satisfactory results have been achieved by our prediction system. For example, with the system trained on the available dataset at the end of 2010, the water utility would avoid 50% of pipe breaks in 2011 by examining only 6.98% of its pipes in advance. During the construction of the system, we find that the extremely skew distribution of break and non-break pipes, interestingly, is not an obstacle. This lesson could serve as a practical reference for both academical studies on imbalanced learning as well as future explorations on pipe failure prediction problems.

[1]  Y. Kleiner,et al.  I-WARP: Individual Water mAin Renewal Planner , 2010 .

[2]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[3]  Isam Shahrour,et al.  Application of Artificial Neural Networks (ANN) to model the failure of urban water mains , 2010, Math. Comput. Model..

[4]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[7]  Ari M Frank,et al.  A ranking-based scoring function for peptide-spectrum matches. , 2009, Journal of proteome research.

[8]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[9]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[10]  Balvant Rajani,et al.  Considering Time-dependent Factors in the Statistical Prediction of Water Main Breaks , 2000 .

[11]  Shivani Agarwal,et al.  Stability and Generalization of Bipartite Ranking Algorithms , 2005, COLT.

[12]  Seth D. Guikema,et al.  Statistical models for the analysis of water distribution system pipe break data , 2009, Reliab. Eng. Syst. Saf..

[13]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[14]  Balvant Rajani,et al.  Using limited data to assess future needs , 1999 .

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Vladan Babovic,et al.  A Data Mining Approach to Modelling of Water Supply Assets , 2002 .

[18]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[19]  Ana Debón,et al.  Comparing risk of failure models in water supply networks using ROC curves , 2010, Reliab. Eng. Syst. Saf..

[20]  Uri Shamir,et al.  An Analytic Approach to Scheduling Pipe Replacement , 1979 .

[21]  Balvant Rajani,et al.  Comprehensive review of structural deterioration of water mains: physically based models , 2001 .

[22]  Dalius Misiunas Burst Detection and Location in Pipelines and Pipe Networks - With Application in Water Distribution , 2003 .

[23]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[25]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[26]  David Hope,et al.  Managing infrastructure for the next generation , 1999 .

[27]  John M. Gross Fundamentals of Preventive Maintenance , 2002 .

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Abdelwahab M. Bubtiena,et al.  Application of Artificial Neural networks in modeling water networks , 2011, 2011 IEEE 7th International Colloquium on Signal Processing and its Applications.

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Jing Xiao,et al.  Pipe failure prediction , 2011, Proceedings of 2011 IEEE International Conference on Service Operations, Logistics and Informatics.

[32]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[33]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[34]  D.,et al.  Regression Models and Life-Tables , 2022 .

[35]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[36]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[37]  Symeon E. Christodoulou,et al.  Water Network Assessment and Reliability Analysis by Use of Survival Analysis , 2011 .

[38]  Robert M. Clark,et al.  Water Distribution Systems: A Spatial and Cost Evaluation , 1982 .

[39]  Olli Varis,et al.  Global urbanization and urban water: Can sustainability be afforded? , 1997 .

[40]  Balvant Rajani,et al.  Comprehensive review of structural deterioration of water mains: statistical models , 2001 .

[41]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.