Detecting epidemics using highly noisy data

From Cholera, AIDS/HIV, and Malaria, to rumors and viral video, understanding the causative network behind an epidemic's spread has repeatedly proven critical for managing the spread (controlling or encouraging, as the case may be). Our current approaches to understand and predict epidemics rely on the scarce, but exact/reliable, expert diagnoses. This paper proposes a different way forward: use more readily available but also more noisy data with {\em many false negatives and false positives}, to determine the causative network of an epidemic. Specifically, we consider an epidemic that spreads according to one of two networks. At some point in time we see a small random subsample (perhaps a vanishingly small fraction) of those infected, along with an order-wise similar number of false positives. We derive sufficient conditions for this problem to be detectable, and provide an efficient algorithm that solves the hypothesis testing problem. We apply this model to two settings. In the first setting, we simply want to distinguish between random illness (a complete graph) and an epidemic (spread along a structured graph). In the second, we have a superposition of both of these, and we wish to detect which is the strongest component.

[1]  Yuval Peres,et al.  Tree-indexed random walks on groups and first passage percolation , 1994 .

[2]  Shie Mannor,et al.  On identifying the causative network of an epidemic , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Devavrat Shah,et al.  Detecting sources of computer viruses in networks: theory and experiment , 2010, SIGMETRICS '10.

[4]  Rick Durrett,et al.  Random Graph Dynamics (Cambridge Series in Statistical and Probabilistic Mathematics) , 2006 .

[5]  P. O’Neill,et al.  Bayesian inference for epidemics with two levels of mixing , 2005 .

[6]  R. Durrett Random Graph Dynamics: References , 2006 .

[7]  Jure Leskovec,et al.  Information diffusion and external influence in networks , 2012, KDD.

[8]  Jon Cohen,et al.  Making Headway Under Hellacious Circumstances , 2006, Science.

[9]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[10]  Sujay Sanghavi,et al.  Learning the graph of epidemic cascades , 2012, SIGMETRICS '12.

[11]  J. Snow On the Mode of Communication of Cholera , 1856, Edinburgh medical journal.

[12]  Wuqiong Luo,et al.  Identifying infection sources in large tree networks , 2012, 2012 9th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON).

[13]  Devavrat Shah,et al.  Rumors in a Network: Who's the Culprit? , 2009, IEEE Transactions on Information Theory.

[14]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[15]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[16]  Andy Blackburn,et al.  Google Flu trends , 2008 .

[17]  Donald F. Towsley,et al.  The effect of network topology on the spread of epidemics , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[18]  Shie Mannor,et al.  Network forensics: random infection vs spreading epidemic , 2012, SIGMETRICS '12.

[19]  Aditya Gopalan,et al.  Random mobility and the spread of infection , 2011, 2011 Proceedings IEEE INFOCOM.

[20]  P. O’Neill,et al.  Bayesian inference for stochastic multitype epidemics in structured populations via random graphs , 2005 .

[21]  H. Kesten On the Speed of Convergence in First-Passage Percolation , 1993 .

[22]  Frank Ball,et al.  Poisson approximations for epidemics with two levels of mixing , 2004 .