论文信息 - DAGGER: A sequential algorithm for FDR control on DAGs - 字舞流文

DAGGER: A sequential algorithm for FDR control on DAGs

We propose a linear-time, single-pass, top-down algorithm for multiple testing on directed acyclic graphs (DAGs), where nodes represent hypotheses and edges specify a partial ordering in which hypotheses must be tested. The procedure is guaranteed to reject a sub-DAG with bounded false discovery rate (FDR) while satisfying the logical constraint that a rejected node's parents must also be rejected. It is designed for sequential testing settings, when the DAG structure is known a priori, but the $p$-values are obtained selectively (such as in a sequence of experiments), but the algorithm is also applicable in non-sequential settings when all $p$-values can be calculated in advance (such as variable/model selection). Our DAGGER algorithm, shorthand for Greedily Evolving Rejections on DAGs, provably controls the false discovery rate under independence, positive dependence or arbitrary dependence of the $p$-values. The DAGGER procedure specializes to known algorithms in the special cases of trees and line graphs, and simplifies to the classical Benjamini-Hochberg procedure when the DAG has no edges. We explore the empirical performance of DAGGER using simulations, as well as a real dataset corresponding to a gene ontology, showing favorable performance in terms of time and power.

Martin J. Wainwright | Michael I. Jordan | Aaditya Ramdas | Jianbo Chen | M. Wainwright | Aaditya Ramdas | Jianbo Chen

[1] E. Suchman,et al. The American Soldier: Adjustment During Army Life. , 1949 .

[2] E. Lehmann. Some Concepts of Dependence , 1966 .

[3] B. Rüger. Das maximale signifikanzniveau des Tests: “LehneHo ab, wennk untern gegebenen tests zur ablehnung führen” , 1978 .

[4] S. Karlin,et al. Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions , 1980 .

[5] R. Simes,et al. An improved Bonferroni procedure for multiple tests of significance , 1986 .

[6] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[7] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8] M. Ashburner,et al. Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9] Y. Benjamini,et al. THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10] John D. Storey,et al. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[11] U. Mansmann,et al. Global testing of differential gene expression , 2006 .

[12] G. Blanchard,et al. Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[13] Dean P. Foster,et al. α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[14] P. Rosenbaum. Testing hypotheses in order , 2008 .

[15] N. Meinshausen. Hierarchical testing of variable importance , 2008 .

[16] Ulrich Mansmann,et al. GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[17] Ulrich Mansmann,et al. Multiple testing on the directed acyclic graph of gene ontology , 2008, Bioinform..

[18] D. Yekutieli. Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[19] V. Vovk,et al. Combining P-Values Via Averaging , 2012, Biometrika.

[20] R. Tibshirani,et al. Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[21] Gavin Lynch,et al. The control of the false discovery rate under structured hypotheses , 2014 .

[22] S. Rosset,et al. Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[23] John R. Stevens,et al. A shortcut for multiple testing on the directed acyclic graph of gene ontology , 2014, BMC Bioinformatics.

[24] J. Goeman,et al. A multiple testing method for hypotheses structured in a directed acyclic graph , 2015, Biometrical journal. Biometrische Zeitschrift.

[25] E. Candès,et al. Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[26] Ang Li,et al. Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[27] Jelle J. Goeman,et al. Multiple Testing of Gene Sets from Gene Ontology: Possibilities and Pitfalls , 2016, Briefings Bioinform..

[28] G. Lynch,et al. On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses , 2016, 1612.04467.

[29] G. Lynch,et al. The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[30] Adel Javanmard,et al. Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[31] Martin J. Wainwright,et al. More powerful and flexible rules for online FDR control with memory and weights , 2017, NIPS 2017.

[32] Yoav Benjamini,et al. Testing hypotheses on a tree: new error rates and controlling strategies , 2017, 1705.07529.

[33] Martin J. Wainwright,et al. Online control of the false discovery rate with decaying memory , 2017, NIPS.

[34] Aaditya Ramdas,et al. The p‐filter: multilayer false discovery rate control for grouped hypotheses , 2017 .

[35] Aaditya Ramdas,et al. Towards "simultaneous selective inference": post-hoc bounds on the false discovery proportion , 2018, 1803.06790.

[36] Martin J. Wainwright,et al. SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[37] Chiara Sabatti,et al. MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS. , 2017, The annals of applied statistics.

[38] Michael I. Jordan,et al. A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[39] Aaditya Ramdas,et al. A general interactive framework for false discovery rate control under structural constraints , 2017, 1710.02776.