Sequence alignment by rare event simulation

We present a new stochastic method for finding the optimal alignment of DNA sequences. The method works by generating random paths through a graph (the edit graph) according to a Markov chain. Each path is assigned a score, and these scores are used to modify the transition probabilities of the Markov chain. This procedure converges to a fixed path through the graph, corresponding to the optimal (or near-optimal) sequence alignment. The rules with which to update the transition probabilities are based on Rubinstein's (1999, 2000) cross-entropy method, a new technique for stochastic optimization. This leads to very simple and natural updating formulas. Due to its versatility, mathematical tractability and simplicity, the method has great potential for a large class of combinatorial optimization problems, in particular in biological sciences.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  David J. Lipman,et al.  MULTIPLE ALIGNMENT , COMMUNICATION COST , AND GRAPH MATCHING * , 1992 .

[5]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[6]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[7]  P D Carr,et al.  X-ray structure of the signal transduction protein from Escherichia coli at 1.9 A. , 1996, Acta crystallographica. Section D, Biological crystallography.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  R. Rubinstein,et al.  Quick estimation of rare events in stochastic networks , 1997 .

[10]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[11]  Dan Gusfield Algorithms on Strings, Trees, and Sequences: More Applications of Suffix Trees , 1997 .

[12]  H V Westerhoff,et al.  GlnK, a PII-homologue: structure reveals ATP binding site and indicates how the T-loops may be involved in molecular recognition. , 1998, Journal of molecular biology.

[13]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[14]  Knut Reinert,et al.  A polyhedral approach to sequence alignment problems , 2000, Discret. Appl. Math..

[15]  Walter J. Gutjahr,et al.  A Graph-based Ant System and its convergence , 2000, Future Gener. Comput. Syst..

[16]  Pieter Tjerk de Boer,et al.  Analysis and efficient simulation of queueing models of telecommunications systems , 2000 .

[17]  Dirk P. Kroese,et al.  Combinatorial Optimization via Cross-Entropy , 2004 .

[18]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .