Identifying genes from up-down properties of microarray expression series

MOTIVATION We consider any collection of microarrays that can be ordered to form a progression; for example, as a function of time, severity of disease or dose of a stimulant. By plotting the expression level of each gene as a function of time, or severity, or dose, we form an expression series, or curve, for each gene. While most of these curves will exhibit random fluctuations, some will contain a pattern, and these are the genes that are most likely associated with the quantity used to order them. RESULTS We introduce a method of identifying the pattern and hence genes in microarray expression curves without knowing what kind of pattern to look for. Key to our approach is the sequence of ups and downs formed by pairs of consecutive data points in each curve. As a benchmark, we blindly identified genes from yeast cell cycles without selecting for periodic or any other anticipated behaviour. CONTACT tmf20@cam.ac.uk SUPPLEMENTARY INFORMATION The complete versions of Table 2 and Figure 4, as well as other material, can be found at http://www.lps.ens.fr/~willbran/up-down/ or http://www.tcm.phy.cam.ac.uk/~tmf20/up-down/

[1]  R. J. Cho,et al.  Candidate regulatory sequence elements for cell cycle-dependent transcription in Saccharomyces cerevisiae. , 1999, Genome research.

[2]  Brian K. Kennedy,et al.  A large-scale overexpression screen in Saccharomyces cerevisiae identifies previously uncharacterized cell cycle genes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[4]  George G. Szpiro The number of permutations with a given signature, and the expectations of their elements , 2001, Discret. Math..

[5]  Nicola J. Rinaldi,et al.  Serial Regulation of Transcriptional Regulators in the Yeast Cell Cycle , 2001, Cell.

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  E. Seneta,et al.  Peaks and Eulerian numbers in a random sequence , 1996, Journal of Applied Probability.

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[12]  Kerby Shedden,et al.  Analysis of cell-cycle gene expression in Saccharomyces cerevisiae using microarrays and multiple synchronization methods , 2002, Nucleic Acids Res..

[13]  L. P. Zhao,et al.  Statistical modeling of large microarray data sets to identify stimulus-response profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[15]  James I. Garrels,et al.  Yeast Protein database (YPD): a database for the complete proteome of Saccharomyces cerevisiae , 1997, Nucleic Acids Res..

[16]  Tom Freeman,et al.  Changes in cervical keratinocyte gene expression associated with integration of human papillomavirus 16. , 2002, Cancer research.

[17]  Michael Ruogu Zhang,et al.  Super-paramagnetic clustering of yeast gene expression profiles , 1999, physics/9911038.