Data mining for early disease outbreak detection

This thesis presents an early disease outbreak detection algorithm called What's Strange About Recent Events (WSARE). Unlike traditional disease outbreak detection algorithms that look for peaks in a univariate time series of health-care data, WSARE tries to improve its timeliness of detection by taking a novel multivariate approach. Current health-care data used for surveillance are no longer simply a time series of aggregate daily counts. Instead, a wealth of spatial; temporal, demographic, and symptom information is available. WSARE incorporates all of this information using a rule-based approach that compares recent health-care data against data from a baseline distribution and finds subgroups of the data whose proportions have changed the most in the recent data. In addition, health-care data also pose difficulties for surveillance algorithms because of inherent temporal trends such as seasonal effects and day of week variations. WSARE approaches this problem using a Bayesian network to produce a baseline distribution that accounts for these temporal trends. The algorithm itself incorporates a wide range of ideas, including association rules, Bayesian networks, hypothesis testing and permutation tests to produce a powerful detection algorithm that is careful to evaluate the significance of the alarms that it raises.