A new summarization method for affymetrix probe level data

MOTIVATION We propose a new model-based technique for summarizing high-density oligonucleotide array data at probe level for Affymetrix GeneChips. The new summarization method is based on a factor analysis model for which a Bayesian maximum a posteriori method optimizes the model parameters under the assumption of Gaussian measurement noise. Thereafter, the RNA concentration is estimated from the model. In contrast to previous methods our new method called 'Factor Analysis for Robust Microarray Summarization (FARMS)' supplies both P-values indicating interesting information and signal intensity values. RESULTS We compare FARMS on Affymetrix's spike-in and Gene Logic's dilution data to established algorithms like Affymetrix Microarray Suite (MAS) 5.0, Model Based Expression Index (MBEI), Robust Multi-array Average (RMA). Further, we compared FARMS with 43 other methods via the 'Affycomp II' competition. The experimental results show that FARMS with default parameters outperforms previous methods if both sensitivity and specificity are simultaneously considered by the area under the receiver operating curve (AUC). We measured two quantities through the AUC: correctly detected expression changes versus wrongly detected (fold change) and correctly detected significantly different expressed genes in two sets of arrays versus wrongly detected (P-value). Furthermore FARMS is computationally less expensive then RMA, MAS and MBEI. AVAILABILITY The FARMS R package is available from http://www.bioinf.jku.at/software/farms/farms.html. SUPPLEMENTARY INFORMATION http://www.bioinf.jku.at/publications/papers/farms/supplementary.ps

[1]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[2]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[3]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[4]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[7]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[8]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D Hasenclever,et al.  Comparison of Preprocessing Procedures for Oligo-nucleotide Micro-arrays by Parametric Bootstrap Simulation of Spike-in Experiments , 2004, Methods of Information in Medicine.

[11]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[12]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[13]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[14]  K. Jöreskog Some contributions to maximum likelihood factor analysis , 1967 .

[15]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[16]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[18]  C. Pichot,et al.  A Model-Based , 1991 .

[19]  Felix Naef,et al.  Empirical characterization of the expression ratio noise structure in high-density oligonucleotide arrays , 2002, Genome Biology.

[20]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[21]  E. Chudin,et al.  Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip® arrays , 2001, Genome Biology.

[22]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[23]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[24]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: 2003 update , 2003, Nucleic Acids Res..

[25]  M. Degroot Optimal Statistical Decisions , 1970 .

[26]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[27]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[28]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .