Evaluation of normalization methods for cDNA microarray data by k-NN classification

BackgroundNon-biological factors give rise to unwanted variations in cDNA microarray data. There are many normalization methods designed to remove such variations. However, to date there have been few published systematic evaluations of these techniques for removing variations arising from dye biases in the context of downstream, higher-order analytical tasks such as classification.ResultsTen location normalization methods that adjust spatial- and/or intensity-dependent dye biases, and three scale methods that adjust scale differences were applied, individually and in combination, to five distinct, published, cancer biology-related cDNA microarray data sets. Leave-one-out cross-validation (LOOCV) classification error was employed as the quantitative end-point for assessing the effectiveness of a normalization method. In particular, a known classifier, k-nearest neighbor (k-NN), was estimated from data normalized using a given technique, and the LOOCV error rate of the ensuing model was computed. We found that k-NN classifiers are sensitive to dye biases in the data. Using N ONRM and GMEDIAN as baseline methods, our results show that single-bias-removal techniques which remove either spatial-dependent dye bias (referred later as spatial effect) or intensity-dependent dye bias (referred later as intensity effect) moderately reduce LOOCV classification errors; whereas double-bias-removal techniques which remove both spatial- and intensity effect reduce LOOCV classification errors even further. Of the 41 different strategies examined, three two-step processes, IG LOESS-SL FILTERW7, IST SPLINE-SL LOESS and IG LOESS-SL LOESS, all of which removed intensity effect globally and spatial effect locally, appear to reduce LOOCV classification errors most consistently and effectively across all data sets. We also found that the investigated scale normalization methods do not reduce LOOCV classification error.ConclusionUsing LOOCV error of k-NNs as the evaluation criterion, three double-bias-removal normalization strategies, IG LOESS-SL FILTERW7, IST SPLINE-SL LOESS and IG LOESS-SL LOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.

[1]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[2]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[3]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[4]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Bernhard Schölkopf,et al.  Bounds on Error Expectation for SVM , 2000 .

[7]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[8]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[9]  J. Gray,et al.  Genome changes and gene expression in human solid tumors. , 2000, Carcinogenesis.

[10]  O. Chapelle,et al.  Bounds on error expectation for SVM , 2000 .

[11]  Jean Yee Hwa Yang,et al.  Analysis of CDNA Microarray Images , 2001, Briefings Bioinform..

[12]  Thomas Lengauer,et al.  Centralization: a new method for the normalization of gene expression data , 2001, ISMB.

[13]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[15]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[16]  M. Oh,et al.  Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. , 2001, Nucleic acids research.

[17]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[18]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[19]  S. Wölfl,et al.  Ranking: a closer look on globalisation methods for normalisation of gene expression arrays. , 2002, Nucleic acids research.

[20]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[21]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .

[22]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[23]  Adam B. Olshen,et al.  Deriving quantitative conclusions from microarray expression data , 2002, Bioinform..

[24]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[25]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[26]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[27]  Taesung Park,et al.  Evaluation of normalization methods for microarray data , 2003 .

[28]  M. Rattray,et al.  A model-based analysis of microarray experimental error and normalisation. , 2003, Nucleic acids research.

[29]  Yan Wu,et al.  Quantitative Quality Control in Microarray Experiments and the Application in Data Filtering, Normalization and False Positive Rate Prediction , 2003, Bioinform..

[30]  Jian Huang,et al.  A Semi-linear Model for Normalization and Analysis of cDNA Microarray Data , 2003 .

[31]  Dale L. Wilson,et al.  New Normalization Methods for CDNA Microarray Data , 2003, Bioinform..

[32]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[33]  M. Terris,et al.  Gene expression patterns in renal cell carcinoma assessed by complementary DNA microarray. , 2003, The American journal of pathology.

[34]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[35]  Alexander G. Gray,et al.  Efficient exact k-NN and nonparametric classification in high dimensions , 2003, NIPS 2003.

[36]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[37]  D. Botstein,et al.  Variation in gene expression patterns in human gastric cancers. , 2003, Molecular biology of the cell.

[38]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[39]  Sandrine Dudoit,et al.  Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Data , 2003 .

[40]  D. Albertson,et al.  Chromosome aberrations in solid tumors , 2003, Nature Genetics.

[41]  Gavin C. Cawley,et al.  Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers , 2003, Pattern Recognit..

[42]  X. Cui,et al.  Transformations for cDNA Microarray Data , 2003, Statistical applications in genetics and molecular biology.

[43]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[44]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[45]  L. Qin,et al.  Empirical evaluation of data transformations and ranking statistics for microarray analysis. , 2004, Nucleic acids research.

[46]  P. Tam,et al.  Normalization and analysis of cDNA microarrays using within-array replications applied to neuroblastoma cell response to a cytokine. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[48]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[49]  Willem A Rensink,et al.  Statistical issues in microarray data analysis. , 2006, Methods in molecular biology.