Statistical Significance for Genome-Wide Experiments

With the increase in genome-wide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genome-wide data set are tested against some null hypothesis, where many features are expected to be significant. Here we propose an approach to statistical significance in the analysis of genome-wide data sets, based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true findings and the number of false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q-value is associated with each tested feature in addition to the traditional p-value. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

[1]  R. Amann,et al.  Predictive Identification of Exonic Splicing Enhancers in Human Genes , 2022 .

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[6]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[7]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[8]  John D. Storey A direct approach to false discovery rates , 2002 .

[9]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[10]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[11]  Christopher R. Genovese,et al.  Operating Characteristics and Extensions of the FDR Procedure , 2001 .

[12]  H. Liu,et al.  TFAR19, a novel apoptosis-related gene cloned from human leukemia cell line TF-1, could enhance apoptosis of some tumor cells induced by growth factor withdrawal. , 1999, Biochemical and biophysical research communications.

[13]  T. Lüscher,et al.  Connective Tissue Growth Factor Induces Apoptosis in Human Breast Cancer Cell Line MCF-7* , 1999, The Journal of Biological Chemistry.

[14]  Albert-László Barabási,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002 .

[15]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[16]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[17]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[18]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[19]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[20]  R. Kolodner,et al.  Biochemistry and genetics of eukaryotic mismatch repair. , 1996, Genes & development.