What's Strange About Recent Events (WSARE): An Algorithm for the Early Detection of Disease Outbreaks

Traditional biosurveillance algorithms detect disease outbreaks by looking for peaks in a univariate time series of health-care data. Current health-care surveillance data, however, are no longer simply univariate data streams. Instead, a wealth of spatial, temporal, demographic and symptomatic information is available. We present an early disease outbreak detection algorithm called What's Strange About Recent Events (WSARE), which uses a multivariate approach to improve its timeliness of detection. WSARE employs a rule-based technique that compares recent health-care data against data from a baseline distribution and finds subgroups of the recent data whose proportions have changed the most from the baseline data. In addition, health-care data also pose difficulties for surveillance algorithms because of inherent temporal trends such as seasonal effects and day of week variations. WSARE approaches this problem using a Bayesian network to produce a baseline distribution that accounts for these temporal trends. The algorithm itself incorporates a wide range of ideas, including association rules, Bayesian networks, hypothesis testing and permutation tests to produce a detection algorithm that is careful to evaluate the significance of the alarms that it raises.

[1]  Galit Shmueli,et al.  Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Andrew W. Moore,et al.  Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning , 2003, ICML.

[3]  Weng-Keen Wong,et al.  Bayesian Biosurveillance of Disease Outbreaks , 2004, UAI.

[4]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[5]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[6]  Andrew W. Moore,et al.  Detecting Significant Multidimensional Spatial Clusters , 2004, NIPS.

[7]  R. Serfling Methods for current statistical analysis of excess pneumonia-influenza deaths. , 1963, Public health reports.

[8]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[9]  L. Hutwagner,et al.  The bioterrorism preparedness and response Early Aberration Reporting System (EARS) , 2003, Journal of Urban Health.

[10]  Galit Shmueli,et al.  Using grocery sales data for the detection of bio-terrorist attacks , 2002 .

[11]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[12]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[13]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[14]  Michael M. Wagner,et al.  Value of ICD-9-Coded Chief Complaints for Detection of Epidemics , 2002, J. Am. Medical Informatics Assoc..

[15]  S. W. Roberts,et al.  Control Chart Tests Based on Geometric Moving Averages , 2000, Technometrics.

[16]  Jun Zhang,et al.  Detection of Outbreaks from Time Series Data Using Wavelet Transform , 2003, AMIA.

[17]  M. Kulldorff Spatial Scan Statistics: Models, Calculations, and Applications , 1999 .

[18]  Andrew W. Moore,et al.  A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters , 2003, NIPS.

[19]  Warren T. Jones,et al.  Research Paper: Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance , 1998, J. Am. Medical Informatics Assoc..

[20]  Martin Kulldorff,et al.  An elliptic spatial scan statistic and its application to breast cancer mortality data in Northeastern United States , 2006, Journal of Urban Health.

[21]  T. Allen Handling uncertainty when you're handling uncertainty: model selection and error bars for belief networks , 2000 .

[22]  Andrew W. Moore,et al.  Algorithms for rapid outbreak detection: a research synthesis , 2005, J. Biomed. Informatics.

[23]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[24]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[25]  Carol Harris,et al.  Minority report. , 2002, The Health service journal.

[26]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[27]  S B Thacker,et al.  An evaluation of influenza mortality surveillance, 1962-1979. I. Time series forecasts of expected pneumonia and influenza deaths. , 1981, American journal of epidemiology.

[28]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[29]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[30]  J. Hardin,et al.  Association rules and data mining in hospital infection control and public health surveillance. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[31]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[32]  Andrew W. Moore,et al.  Rule-based anomaly pattern detection for detecting disease outbreaks , 2002, AAAI/IAAI.

[33]  Carla E. Brodley,et al.  IP covert timing channels: design and detection , 2004, CCS '04.

[34]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[35]  D. Sosin Draft framework for evaluating syndromic surveillance systems , 2003, Journal of Urban Health.

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[37]  W. Pauli,et al.  What to Do When You Don't Have Much Data: Issues in Small Sample Parameter Learning in Bayesian Networks , 2003 .

[38]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[39]  Yiming Yang,et al.  A Probabilistic Model for Online Document Clustering with Application to Novelty Detection , 2004, NIPS.

[40]  Martin Kulldorff,et al.  Prospective time periodic geographical disease surveillance using a scan statistic , 2001 .

[41]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[42]  Russell Greiner,et al.  Bayesian Error-Bars for Belief Net Inference , 2001, UAI.

[43]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .

[44]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[45]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[46]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.

[47]  L Watier,et al.  A time series construction of an alert threshold with application to S. bovismorbificans in France. , 1991, Statistics in medicine.

[48]  Fred Spiring,et al.  Introduction to Statistical Quality Control , 2007, Technometrics.

[49]  Andrew W. Moore,et al.  Data mining for early disease outbreak detection , 2004 .

[50]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[51]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[52]  F. Mostashari,et al.  Syndromic surveillance: A local perspective , 2003, Journal of Urban Health.

[53]  Kymie M. C. Tan,et al.  Anomaly Detection in Embedded Systems , 2002, IEEE Trans. Computers.

[54]  G. D. Williamson,et al.  A monitoring system for detecting aberrations in public health surveillance reports. , 1999, Statistics in medicine.

[55]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[56]  Kenneth D. Mandl,et al.  Time series modeling for syndromic surveillance , 2003, BMC Medical Informatics Decis. Mak..

[57]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[58]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[59]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[60]  M. J.,et al.  CONTROLLING THE FALSE-DISCOVERY RATE IN ASTROPHYSICAL DATA ANALYSIS , 2001 .