Spatial smoothing and hot spot detection for CGH data using the fused lasso.

We apply the "fused lasso" regression method of (TSRZ2004) to the problem of "hot- spot detection", in particular, detection of regions of gain or loss in comparative genomic hybridization (CGH) data. The fused lasso criterion leads to a convex optimization problem, and we provide a fast algorithm for its solution. Estimates of false-discovery rate are also provided. Our studies show that the new method generally outperforms competing methods for calling gains and losses in CGH data.

[1]  David W. Scott The New S Language , 1990 .

[2]  John L. Prince,et al.  Discussion and Future Work , 1994 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Bradley Efron,et al.  Microarrays empirical Bayes methods, and false discovery rates , 2001 .

[6]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  John D. Storey A direct approach to false discovery rates , 2002 .

[8]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[9]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[10]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[11]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[12]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[13]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[14]  L. Recht,et al.  High-resolution genome-wide mapping of genetic alterations in human glial brain tumors. , 2005, Cancer research.

[15]  Yonatan Aumann,et al.  Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis , 2005, RECOMB.

[16]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[17]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[18]  Christian J Stoeckert,et al.  STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. , 2006, Genome research.

[19]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..