A sequential algorithm for false discovery rate control on directed acyclic graphs

&NA; We propose a linear‐time, single‐pass, top‐down algorithm for multiple testing on directed acyclic graphs, where nodes represent hypotheses and edges specify a partial ordering in which the hypotheses must be tested. The procedure is guaranteed to reject a sub‐directed acyclic graph with bounded false discovery rate while satisfying the logical constraint that a rejected node's parents must also be rejected. It is designed for sequential testing settings where the directed acyclic graph structure is known a priori but the p‐values are obtained selectively, such as in a sequence of experiments; however, the algorithm is also applicable in nonsequential settings where all p‐values can be calculated in advance, such as in model selection. Our algorithm provably controls the false discovery rate under independence, positive dependence or arbitrary dependence of the p‐values and specializes to known algorithms in the special cases of trees and line graphs; it simplifies to the classical Benjamini‐Hochberg procedure when the directed acyclic graph has no edges. We explore the empirical performance of our algorithm through simulations and analysis of a real dataset corresponding to a gene ontology, and we demonstrate its favourable performance in terms of computational time and power.

[1]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[2]  E. Lehmann Some Concepts of Dependence , 1966 .

[3]  B. Rüger Das maximale signifikanzniveau des Tests: “LehneHo ab, wennk untern gegebenen tests zur ablehnung führen” , 1978 .

[4]  S. Karlin,et al.  Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[5]  Stochastic Inequalities,et al.  RANDOM VARIABLES WITH MAXIMUM SUMS , 1982 .

[6]  L. Rüschendorf Random variables with maximum sums , 1982, Advances in Applied Probability.

[7]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[12]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[13]  U. Mansmann,et al.  Global testing of differential gene expression , 2006 .

[14]  G. Blanchard,et al.  Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[15]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[16]  P. Rosenbaum Testing hypotheses in order , 2008 .

[17]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[18]  Ulrich Mansmann,et al.  GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[19]  Ulrich Mansmann,et al.  Multiple testing on the directed acyclic graph of gene ontology , 2008, Bioinform..

[20]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[21]  V. Vovk,et al.  Combining P-Values Via Averaging , 2012, Biometrika.

[22]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[23]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[24]  John R. Stevens,et al.  A shortcut for multiple testing on the directed acyclic graph of gene ontology , 2014, BMC Bioinformatics.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  J. Goeman,et al.  A multiple testing method for hypotheses structured in a directed acyclic graph , 2015, Biometrical journal. Biometrische Zeitschrift.

[27]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[28]  Ang Li,et al.  Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[29]  Jelle J. Goeman,et al.  Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls , 2016, Briefings Bioinform..

[30]  G. Lynch,et al.  On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses , 2016, 1612.04467.

[31]  G. Lynch,et al.  The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[32]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[33]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[34]  Aaditya Ramdas,et al.  Towards "simultaneous selective inference": post-hoc bounds on the false discovery proportion , 2018, 1803.06790.

[35]  Martin J. Wainwright,et al.  SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[36]  Michael I. Jordan,et al.  A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[37]  Aaditya Ramdas,et al.  A general interactive framework for false discovery rate control under structural constraints , 2017, 1710.02776.