DAGGER: A sequential algorithm for FDR control on DAGs

We propose a linear-time, single-pass, top-down algorithm for multiple testing on directed acyclic graphs (DAGs), where nodes represent hypotheses and edges specify a partial ordering in which hypotheses must be tested. The procedure is guaranteed to reject a sub-DAG with bounded false discovery rate (FDR) while satisfying the logical constraint that a rejected node's parents must also be rejected. It is designed for sequential testing settings, when the DAG structure is known a priori, but the $p$-values are obtained selectively (such as in a sequence of experiments), but the algorithm is also applicable in non-sequential settings when all $p$-values can be calculated in advance (such as variable/model selection). Our DAGGER algorithm, shorthand for Greedily Evolving Rejections on DAGs, provably controls the false discovery rate under independence, positive dependence or arbitrary dependence of the $p$-values. The DAGGER procedure specializes to known algorithms in the special cases of trees and line graphs, and simplifies to the classical Benjamini-Hochberg procedure when the DAG has no edges. We explore the empirical performance of DAGGER using simulations, as well as a real dataset corresponding to a gene ontology, showing favorable performance in terms of time and power.

[1]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[2]  E. Lehmann Some Concepts of Dependence , 1966 .

[3]  B. Rüger Das maximale signifikanzniveau des Tests: “LehneHo ab, wennk untern gegebenen tests zur ablehnung führen” , 1978 .

[4]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[5]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[6]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[11]  U. Mansmann,et al.  Global testing of differential gene expression , 2006 .

[12]  G. Blanchard,et al.  Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[13]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[14]  P. Rosenbaum Testing hypotheses in order , 2008 .

[15]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[16]  Ulrich Mansmann,et al.  GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[17]  Ulrich Mansmann,et al.  Multiple testing on the directed acyclic graph of gene ontology , 2008, Bioinform..

[18]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[19]  V. Vovk,et al.  Combining P-Values Via Averaging , 2012, Biometrika.

[20]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[21]  Gavin Lynch,et al.  The control of the false discovery rate under structured hypotheses , 2014 .

[22]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[23]  John R. Stevens,et al.  A shortcut for multiple testing on the directed acyclic graph of gene ontology , 2014, BMC Bioinformatics.

[24]  J. Goeman,et al.  A multiple testing method for hypotheses structured in a directed acyclic graph , 2015, Biometrical journal. Biometrische Zeitschrift.

[25]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[26]  Ang Li,et al.  Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[27]  Jelle J. Goeman,et al.  Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls , 2016, Briefings Bioinform..

[28]  G. Lynch,et al.  On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses , 2016, 1612.04467.

[29]  G. Lynch,et al.  The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[30]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[31]  Martin J. Wainwright,et al.  More powerful and flexible rules for online FDR control with memory and weights , 2017, NIPS 2017.

[32]  Yoav Benjamini,et al.  Testing hypotheses on a tree: new error rates and controlling strategies , 2017, 1705.07529.

[33]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[34]  Aaditya Ramdas,et al.  The p‐filter: multilayer false discovery rate control for grouped hypotheses , 2017 .

[35]  Aaditya Ramdas,et al.  Towards "simultaneous selective inference": post-hoc bounds on the false discovery proportion , 2018, 1803.06790.

[36]  Martin J. Wainwright,et al.  SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[37]  Chiara Sabatti,et al.  MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS. , 2017, The annals of applied statistics.

[38]  Michael I. Jordan,et al.  A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[39]  Aaditya Ramdas,et al.  A general interactive framework for false discovery rate control under structural constraints , 2017, 1710.02776.