Computational Biology

A sensitive method for multiple sequence alignment should be able to align local motifs that are contained in some but not necessarily in all of the input sequences. In addition, it should be possible to integrate various of such partial local alignments into one single multiple output alignment. This leads to the question of consistency of partial alignments. Based on a new set-theoretical definition of sequence alignment, the consistency problem is discussed theoretically, and a recently developed library of C functions for consistency calculation (GABIOSLIB) is described. GABIOS-LIB has been integrated into the DIALIGN alignment program to carry out consistency tests during the multiple alignment procedure. While the resulting alignments are exactly the same as with the previous version of DIALIGN, the running time of the program has been crucially improved. For large data sets, the new version of DIALIGN is up to 120 times faster than the old version. Availability: http://bibiserv.TechFak.Uni-Bielefeld.DE/dialign/

[1]  G. Bernardi,et al.  The isochore organization of the human genome. , 1989, Annual review of genetics.

[2]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  A L Goldberger,et al.  Correlation approach to identify coding regions in DNA sequences. , 1994, Biophysical journal.

[4]  J. Oliver,et al.  Sequence Compositional Complexity of DNA through an Entropic Segmentation Method , 1998 .

[5]  E V Koonin,et al.  Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. , 1997, Nucleic acids research.

[6]  M S Gelfand,et al.  Prediction of function in DNA sequence analysis. , 1995, Journal of computational biology : a journal of computational molecular cell biology.

[7]  H E Stanley,et al.  Finding borders between coding and noncoding DNA regions by an entropic segmentation method. , 2000, Physical review letters.

[8]  Giorgio Bernardi,et al.  Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins , 1991, Journal of Molecular Evolution.

[9]  Wentian Li,et al.  The Study of Correlation Structures of DNA Sequences: A Critical Review , 1997, Comput. Chem..

[10]  Mikhail A. Roytberg,et al.  DNA Segmentation Through the Bayesian Approach , 2000, J. Comput. Biol..

[11]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[12]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[13]  Dominique Cellier,et al.  Convergence Assessment in Latent Variable Models: DNA Applications , 1998 .

[14]  Kunihiko Kaneko,et al.  DNA correlations , 1992, Nature.

[15]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[16]  J W Fickett,et al.  Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA. , 1995, Journal of molecular biology.

[17]  Hanspeter Herzel,et al.  Correlations in DNA sequences: The role of protein coding segments , 1997 .

[18]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[19]  A V Finkelstein,et al.  Computation of biopolymers: a general approach to different problems. , 1993, Bio Systems.

[20]  S Karlin,et al.  Patchiness and correlations in DNA sequences , 1993, Science.

[21]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..