A tail strength measure for assessing the overall significance in a dataset

We propose an overall measure of significance for a set of hypothesis tests. The tail strength is a simple function of the p-values computed for each of the tests. This measure is useful, for example, in assessing the overall univariate strength of a large set of features in microarray and other genomic and biomedical studies. It also has a simple relationship to the false discovery rate of the collection of tests. We derive the asymptotic distribution of the tail strength measure, and illustrate its use on a number of real datasets.

[1]  R. Dougherty,et al.  Cross‐subject comparison of principal diffusion direction maps , 2005, Magnetic resonance in medicine.

[2]  R. Warnke,et al.  Immune signatures in follicular lymphoma. , 2005, The New England journal of medicine.

[3]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[4]  L. Staudt,et al.  Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. , 2004, The New England journal of medicine.

[5]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[6]  R. Tibshirani,et al.  Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[8]  Lori E. Dodd,et al.  Partial AUC Estimation and Regression , 2003, Biometrics.

[9]  John D. Storey A direct approach to false discovery rates , 2002 .

[10]  Meland,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[11]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[13]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[14]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[15]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[16]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[20]  Galen R. Shorack,et al.  Functions of Order Statistics , 1972 .

[21]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[22]  Bradley Efron,et al.  Microarrays empirical Bayes methods, and false discovery rates , 2001 .

[23]  Christopher R. Genovese,et al.  Operating Characteristics and Extensions of the FDR Procedure , 2001 .

[24]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .