Rule-based anomaly pattern detection for detecting disease outbreaks

This paper presents an algorithm for performing early detection of disease outbreaks by searching a database of emergency department cases for anomalous patterns. Traditional techniques for anomaly detection are unsatisfactory for this problem because they identify individual data points that are rare due to particular combinations of features. When applied to our scenario, these traditional algorithms discover isolated outliers of particularly strange events, such as someone accidentally shooting their ear, that are not indicative of a new outbreak. Instead, we would like to detect anomalous patterns. These patterns are groups with specific characteristics whose recent pattern of illness is anomalous relative to historical patterns. We propose using a rule-based anomaly detection algorithm that characterizes each anomalous pattern with a rule. The significance of each rule is carefully evaluated using Fisher's Exact Test and a randomization test. Our algorithm is compared against a standard detection algorithm by measuring the number of false positives and the timeliness of detection. Simulated data, produced by a simulator that creates the effects of an epidemic on a city, is used for evaluation. The results indicate that our algorithm has significantly better detection times for common significance thresholds while having a slightly higher false positive rate.

[1]  M. J.,et al.  CONTROLLING THE FALSE-DISCOVERY RATE IN ASTROPHYSICAL DATA ANALYSIS , 2001 .

[2]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[3]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[4]  Dean F. Sittig,et al.  The emerging science of very early detection of disease outbreaks. , 2001, Journal of public health management and practice : JPHMP.

[5]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[6]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[7]  Anna Goldenberg,et al.  Framework for using grocery data for early detection of bio-terrorism attacksa , 2022 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[10]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[11]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[12]  Kymie M. C. Tan,et al.  Anomaly Detection in Embedded Systems , 2002, IEEE Trans. Computers.

[13]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.