Compressive Sensing DNA Microarrays

Compressive sensing microarrays (CSMs) are DNA-based sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM, each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from CS theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Lab experiments suggest that in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80% with all targets to be detected. Furthermore, out-of-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications in which only short hybridization times are allowed.

[1]  Richard G. Baraniuk,et al.  Bayesian Compressive Sensing Via Belief Propagation , 2008, IEEE Transactions on Signal Processing.

[2]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Townsend,et al.  Eukaryotic microbes, species recognition and the geographic limits of species: examples from the kingdom Fungi , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[5]  Piotr Indyk,et al.  Sparse Recovery Using Sparse Random Matrices , 2010, LATIN.

[6]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[7]  Marie-Claude Potier,et al.  On-chip hybridization kinetics for optimization of gene expression experiments. , 2008, BioTechniques.

[8]  Wei Zhang,et al.  Data extraction from composite oligonucleotide microarrays. , 2003, Nucleic acids research.

[9]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[10]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[11]  Richard Baraniuk,et al.  DNA Array Decoding from Nonlinear Measurements by Belief Propagation , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[12]  Alexander Schliep,et al.  New, Improved, and Practical k-Stem Sequence Similarity Measures for Probe Design , 2008, J. Comput. Biol..

[13]  R. Thewes,et al.  A fully electronic DNA sensor with 128 positions and in-pixel A/D conversion , 2004, IEEE Journal of Solid-State Circuits.

[14]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[15]  Navin Kashyap,et al.  DNA codes that avoid secondary structures , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[16]  Olgica Milenkovic,et al.  On graphical representations of algebraic codes suitable for iterative decoding , 2005, IEEE Communications Letters.

[17]  Alexander Schliep,et al.  Group testing with DNA chips: generating designs and decoding experiments , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[18]  Felix Naef,et al.  Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. , 2003, Nucleic acids research.

[19]  D. Du,et al.  Combinatorial Group Testing and Its Applications , 1993 .

[20]  Ronald A. DeVore,et al.  Deterministic constructions of compressed sensing matrices , 2007, J. Complex..

[21]  F. Huang,et al.  Efficient incorporation of CoA, NAD and FAD into RNA by in vitro transcription. , 2003, Nucleic acids research.

[22]  D. Galbraith,et al.  Microarray-based analysis of gene expression in very large gene families: the cytochrome P450 gene superfamily of Arabidopsis thaliana. , 2001, Gene.

[23]  Richard Baraniuk,et al.  Compressed Sensing Reconstruction via Belief Propagation , 2006 .

[24]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[25]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[26]  T. Ryan Gregory,et al.  Macroevolution, hierarchy theory, and the C-value enigma , 2004, Paleobiology.

[27]  James M. Eldred,et al.  Viral Discovery and Sequence Recovery Using DNA Microarrays , 2003, PLoS biology.

[28]  Jonas S. Almeida,et al.  A multivariate prediction model for microarray cross-hybridization , 2006, BMC Bioinformatics.

[29]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing: Closing the Gap Between Performance and Complexity , 2008, ArXiv.