Improvement of clustal-derived sequence alignments with evolutionary algorithms

Multiple sequence alignment (MSA) is a central problem in bioinformatics. In this study, we extended previous efforts using evolutionary algorithms (EAs) for MSA. Candidate solutions in the initial population were derived from the well-known alignment program Clustal X. Evolutionary computation was then used to evolve increasingly appropriate solutions. Three new alignment operators were introduced and tested within the framework of protein sequence alignment. Statistics on alignment quality were generated with respect to selected alignment benchmarks from the BAliBASE database using the BLOSUM 62 substitution matrix. Our results indicate the degree to which EAs can enhance the results of Clustal X. Moreover, the experimental results show that the commonly used sum-of-pairs scoring scheme sometimes fails to correlate higher scoring alignments with increase in alignment quality in terms of the BAliBASE sum-of-pairs score.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Akihiko Konagaya,et al.  Parallel Iterative Aligner with Genetic Algorithm , 1993 .

[3]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[4]  Liming Cai,et al.  Evolutionary computation techniques for multiple sequence alignment , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[5]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[6]  Gary B. Fogel,et al.  A Clustal alignment improver using evolutionary algorithms , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[7]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[8]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[9]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[10]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[11]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[13]  René Thomsen,et al.  Self-adaptive Operator Scheduling Using the Religion-Based EA , 2002, PPSN.