Multiple Sequence Alignment Using SAGA: Investigating the Effects of Operator Scheduling, Population Seeding, and Crossover Operators

Multiple sequence alignment (MSA) is a fundamental problem of great importance in molecular biology. In this study, we investigated several aspects of SAGA, a well-known evolutionary algorithm (EA) for solving MSA problems. The SAGA algorithm is important because it represents a successful attempt at applying EAs to MSA and since it is the first EA to use operator scheduling on this problem. However, it is largely undocumented which elements of SAGA are vital to its performance. An important finding in this study is that operator scheduling does not improve the performance of SAGA compared to a uniform selection of operators. Furthermore, the experiments show that seeding SAGA with a ClustalW-derived alignment allows the algorithm to discover alignments of higher quality compared to the traditional initialization scheme with randomly generated alignments. Finally, the experimental results indicate that SAGA’s performance is largely unaffected when the crossover operators are disabled. Thus, the major determinant of SAGA’s success seems to be the mutation operators and the scoring functions used.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[3]  Lawrence Davis,et al.  Adapting Operator Probabilities in Genetic Algorithms , 1989, ICGA.

[4]  Gary B. Fogel,et al.  Improvement of clustal-derived sequence alignments with evolutionary algorithms , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[5]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[6]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[7]  Moon-Jung Chung,et al.  Multiple sequence alignment using simulated annealing , 1994, Comput. Appl. Biosci..

[8]  Gary B. Fogel,et al.  A Clustal alignment improver using evolutionary algorithms , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[9]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[10]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[11]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[12]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..