Imputing Missing Data for Gene Expression Arrays

The singular value decomposition offers an interesting and stable method for imputation of missing values in gene expression arrays. The basic paradigm is • Learn a set of basis functions or eigen-genes from the complete data. • Impute the missing cells for a gene by regressing its non-missing entries on the eigen-genes, and use the regression function to predict the expression values at the missing locations. ∗Depts. of Statistics, and Health, Research & Policy, Sequoia Hall, Stanford Univ., CA 94305. hastie@stat.stanford.edu †Depts. of Health, Research & Policy, and Statistics, Stanford Univ, tibs@stat.stanford.edu ‡Life Sciences Division, Lawrence Orlando Berkeley National Labs & Dept. of Molecular. and Cell Biology, University of California. Berk.; eisen@genome.stanford.edu; §Department of Biochemistry, Stanford University;pbrown@cmgm.stanford.edu ¶Department of Genetics, Stanford University;botstein@genome.stanford.edu