A statistical model for microarrays, optimal estimation algorithms, and limits of performance

DNA microarray technology relies on the hybridization process, which is stochastic in nature. Currently, probabilistic cross hybridization of nonspecific targets, as well as the shot noise (Poisson noise) originating from specific targets binding, are among the main obstacles for achieving high accuracy in DNA microarray analysis. In this paper, statistical techniques are used to model the hybridization and cross-hybridization processes and, based on the model, optimal algorithms are employed to detect the targets and to estimate their quantities. To verify the theory, two sets of microarray experiments are conducted: one with oligonucleotide targets and the other with complementary DNA (cDNA) targets in the presence of biological background. Both experiments indicate that, by appropriately modeling the cross-hybridization interference, significant improvement in the accuracy over conventional methods such as direct readout can be obtained. This substantiates the fact that the accuracy of microarrays can become exclusively noise limited, rather than interference (i.e., cross-hybridization) limited. The techniques presented in this paper potentially increase considerably the signal-to-noise ratio (SNR), dynamic range, and resolution of DNA and protein microarrays as well as other affinity-based biosensors. A preliminary study of the Cramer-Rao bound for estimating the target concentrations suggests that, in some regimes, cross hybridization may even be beneficial-a result with potential ramifications for probe design, which is currently focused on minimizing cross hybridization. Finally, in its current form, the proposed method is best suited to low-density arrays arising in diagnostics, single nucleotide polymorphism (SNP) detection, toxicology, etc. How to scale it to high-density arrays (with many thousands of spots) is an interesting challenge.

[1]  Thierry Arnould,et al.  Use of a low-density microarray for studying gene expression patterns induced by hepatotoxicants on primary cultures of rat hepatocytes. , 2003, Toxicological sciences : an official journal of the Society of Toxicology.

[2]  Haris Vikalo,et al.  A PROBABILISTIC MODEL FOR INHERENT NOISE AND SYSTEMATIC ERRORS OF MICROARRAYS , 2005 .

[3]  Brendan J. Frey,et al.  GenRate: A Generative Model That Finds and Scores New Genes and Exons in Genomic Microarray Data , 2004, Pacific Symposium on Biocomputing.

[4]  Babak Hassibi,et al.  Optimal Estimation of Gene Expression Levels in Microarrays , 2005 .

[5]  Richard M. Karp,et al.  Universal DNA tag systems: a combinatorial design scheme , 2000, RECOMB '00.

[6]  G. Grinstein,et al.  Modeling of DNA microarray data by using physical properties of hybridization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[8]  Angélica Rangel-López,et al.  Low density DNA microarray for detection of most frequent TP53 missense point mutations , 2005, BMC biotechnology.

[9]  R. Dutton,et al.  Biological shot-noise and quantum-limited signal-to-noise ratio in affinity-based biosensors , 2005 .

[10]  H. Berg Random Walks in Biology , 2018 .

[11]  Guy Leclercq,et al.  Molecular characterization of breast cancer cell lines by a low-density microarray. , 2005, International journal of oncology.

[12]  J. SantaLucia,et al.  The thermodynamics of DNA structural motifs. , 2004, Annual review of biophysics and biomolecular structure.

[13]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Alexander Schliep,et al.  Selecting signature oligonucleotides to identify organisms using DNA arrays , 2002, Bioinform..

[16]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  G. Gibson,et al.  Microarray Analysis , 2020, Definitions.

[18]  Dan V. Nicolau,et al.  Microarray technology and its applications , 2005 .

[19]  I. Shmulevich,et al.  Computational and Statistical Approaches to Genomics , 2007, Springer US.

[20]  Thomas F. Coleman,et al.  A Reflective Newton Method for Minimizing a Quadratic Function Subject to Bounds on Some of the Variables , 1992, SIAM J. Optim..