GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data

Genetic instability represents an important type of biological markers for cancer and many other diseases. Array Comparative Genome Hybridization (aCGH) is a high-throughput cytogenetic technique that can efficiently detect genome-wide genetic instability events such as chromosomal gain, loss, and more complex aneuploidity, collectively known as genome imbalance (GIM). We propose a new statistical method, Genome Imbalance Scanner (GIMscan), for automatically decoding the underlying DNA dosage states from aCGH data. GIMscan captures both the intrinsic (nonrandom) spatial change of genome hybridization intensities, and the prevalent (random) measurement noise during data acquisition; and it simultaneously segments the chromosome and assigns different states to the segmented DNA. We tested the proposed method on both simulated data and real data measured from a colorectal cancer population, and we report competitive or superior performance of GIMscan in comparison with popular extant methods.

[1]  Elena Marchiori,et al.  Breakpoint identification and smoothing of array comparative genomic hybridization data , 2004, Bioinform..

[2]  S. Thorgeirsson,et al.  Cloning, characterization, and chromosomal localization of a gene frequently deleted in human liver cancer (DLC-1) homologous to rat RhoGAP. , 1998, Cancer research.

[3]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[4]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[5]  Adam B. Olshen,et al.  Deriving quantitative conclusions from microarray expression data , 2002, Bioinform..

[6]  J. Haber,et al.  Break-induced replication: A review and an example in budding yeast , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R T Schimke,et al.  Overreplication and recombination of DNA in higher eukaryotes: potential consequences and biological implications. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Nicola Baldini,et al.  Identification of RB1CC1, a novel human gene that can induce RB1 in various human cells , 2002, Oncogene.

[9]  J W Gray,et al.  Comprehensive genome sequence analysis of a breast cancer amplicon. , 2001, Genome research.

[10]  D. Louis,et al.  A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations. , 2005, Biostatistics.

[11]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[12]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[13]  Ian T. Nabney,et al.  Modelling financial time series with switching state space models , 1999, Proceedings of the IEEE/IAFE 1999 Conference on Computational Intelligence for Financial Engineering (CIFEr) (IEEE Cat. No.99TH8408).

[14]  Michael J. Black,et al.  A switching Kalman filter model for the motor cortical coding of hand motion , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[15]  D. Pinkel,et al.  Array comparative genomic hybridization and its applications in cancer , 2005, Nature Genetics.

[16]  Sridhar Mahadevan,et al.  Switching kalman filters for prediction and tracking in an adaptive meteorological sensing network , 2005, 2005 Second Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2005. IEEE SECON 2005..

[17]  H. Ostrer,et al.  A versatile statistical analysis algorithm to detect genome copy number variation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Schimenti,et al.  Synapsis or silence , 2005, Nature Genetics.

[19]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[20]  Sylvia Richardson,et al.  Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model , 2006, Bioinform..

[21]  Mark Hasegawa-Johnson,et al.  Acoustic segmentation using switching state Kalman filter , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Shahrooz Rabizadeh,et al.  The DCC gene product induces apoptosis by a mechanism requiring receptor proteolysis , 1998, Nature.

[23]  S. Dehm,et al.  SRC gene expression in human cancer: the role of transcriptional activation. , 2004, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[24]  C. Harris,et al.  p53 tumor suppressor gene: at the crossroads of molecular carcinogenesis, molecular epidemiology, and cancer risk assessment. , 1996, Environmental health perspectives.

[25]  Bertil Johansson,et al.  Genome characteristics of primary carcinomas, local recurrences, carcinomatoses, and liver metastases from colorectal cancer patients , 2004, Molecular Cancer.

[26]  Peter K. Rogan,et al.  Distortion of quantitative genomic and expression hybridization by Cot-1 DNA: mitigation of this effect , 2005, Nucleic acids research.

[27]  Simon Tavaré,et al.  BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data , 2006, Bioinform..

[28]  Jane Fridlyand,et al.  High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. , 2004, Carcinogenesis.

[29]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[30]  Douglas Grove,et al.  Denoising array-based comparative genomic hybridization data using wavelets. , 2005, Biostatistics.

[31]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[32]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[33]  G Buttin,et al.  Co‐amplified markers alternate in megabase long chromosomal inverted repeats and cluster independently in interphase nuclei at early steps of mammalian gene amplification. , 1992, The EMBO journal.

[34]  S. Goodman,et al.  Evidence that genetic instability occurs at an early stage of colorectal tumorigenesis. , 2001, Cancer research.

[35]  S. Thorgeirsson,et al.  DLC-1 operates as a tumor suppressor gene in human non-small cell lung carcinomas , 2004, Oncogene.

[36]  M. Sales,et al.  Chromosomal imbalances in gastric and esophageal adenocarcinoma: Specific comparative genomic hybridization–detected abnormalities segregate with junctional adenocarcinomas , 2001, Genes, chromosomes & cancer.

[37]  W. Kuo,et al.  Quantitative mapping of amplicon structure by array CGH identifies CYP24 as a candidate oncogene , 2000, Nature Genetics.

[38]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[39]  Sun-Yuan Kung,et al.  Accurate detection of aneuploidies in array CGH and gene expression microarray data , 2004, Bioinform..

[40]  Panos E. Trahanias,et al.  A Hybrid Framework for Mobile Robot Localization: Formulation Using Switching State-Space Models , 2003, Auton. Robots.

[41]  Joe W. Gray,et al.  Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas , 2001, Nature Genetics.

[42]  H. Akaike A new look at the statistical model identification , 1974 .

[43]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[44]  C. M. Steel,et al.  Evidence implicating at least two genes on chromosome 17p in breast carcinogenesis , 1990, The Lancet.

[45]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .