Statistical significance for genomewide studies

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

[1]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[2]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[4]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[5]  R. Kolodner,et al.  Biochemistry and genetics of eukaryotic mismatch repair. , 1996, Genes & development.

[6]  H. Liu,et al.  TFAR19, a novel apoptosis-related gene cloned from human leukemia cell line TF-1, could enhance apoptosis of some tumor cells induced by growth factor withdrawal. , 1999, Biochemical and biophysical research communications.

[7]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[8]  T. Lüscher,et al.  Connective Tissue Growth Factor Induces Apoptosis in Human Breast Cancer Cell Line MCF-7* , 1999, The Journal of Biological Chemistry.

[9]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[10]  B. Blencowe Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. , 2000, Trends in biochemical sciences.

[11]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[13]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[14]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[15]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[16]  John D. Storey A direct approach to false discovery rates , 2002 .

[17]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[18]  Phillip A Sharp,et al.  Predictive Identification of Exonic Splicing Enhancers in Human Genes , 2002, Science.

[19]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[20]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[21]  Larry Wasserman,et al.  Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching , 2003 .

[22]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .