Combinatorial Pattern Matching

Given n fragments from k > 2 genomes, we will show how to find an optimal chain of colinear non-overlapping fragments in time O(n logk−2 n log log n) and space O(n logk−2 n). Our result solves an open problem posed by Myers and Miller because it reduces the time complexity of their algorithm by a factor log 2 n log log n and the space complexity by a factor log n. For k = 2 genomes, our algorithm takes O(n log n) time and O(n) space.

[1]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[2]  Uzi Vishkin,et al.  On Finding Lowest Common Ancestors: Simplification and Parallelization , 1988, AWOC.

[3]  E. Boerwinkle,et al.  DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene , 1998, Nature Genetics.

[4]  Uzi Vishkin,et al.  Symmetry breaking for suffix tree construction , 1994, STOC '94.

[5]  E. Boerwinkle,et al.  Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. , 1998, American journal of human genetics.

[6]  Giuseppe Lancia,et al.  Haplotyping Populations: Complexity and Approximations , 2002 .

[7]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[8]  E. Boerwinkle,et al.  Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. , 2000, American journal of human genetics.

[9]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[10]  Jean Vuillemin A unifying look at data structures , 1980, CACM.

[11]  L. Helmuth Genome research: map of the human genome 3.0. , 2001, Science.

[12]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[13]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[14]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[15]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[16]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[17]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[18]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.