Selection of appropriate metaheuristic algorithms for protein structure prediction in AB off-lattice model: a perspective from fitness landscape analysis

Protein structure prediction (PSP) from its primary sequence is a challenging task in computational biology. PSP is an optimization problem that determines the stable or native structure with minimum free energy. Several researchers have applied various heuristic algorithms and/or their variants to solve this problem. However, the mechanism to select a particular algorithm is not known a priori. Fitness landscape analysis (FLA) is a technique to determine the characteristics of a problem or its structural features based on which the most appropriate algorithm can be recommended for solving the problem. The aim of this study is two-fold while considering the PSP problem. Firstly, the structural features are determined by using the standard FLA techniques and secondly, the performance of some of the well-known optimization algorithms are analyzed based on the structural features as an illustration of the usefulness of the former research agenda. In this paper, we determine structural features of the PSP problem by analyzing the landscapes generated by using the quasi-random sampling technique and city block distance. Comprehensive simulations are carried out on both artificial and real protein sequences in 2D and 3D AB off-lattice model. Numerical results indicate that the complexity of the PSP problem increases with protein sequence length. We calculate the Pearson correlation coefficient between the FLA measures, separately for 2D and 3D off-lattice model and significant differences are identified among the measures. Six well-known real-coded optimization algorithms are evaluated over the same set of protein sequences and the performances are subsequently analyzed based on the structural features. Finally, we suggest the most appropriate algorithms for solving different classes of PSP problem.

[1]  Kent McClymont Recent advances in problem understanding: changes in the landscape a year on , 2013, GECCO '13 Companion.

[2]  Ya Li,et al.  Protein secondary structure optimization using an improved artificial bee colony algorithm based on AB off-lattice model , 2014, Eng. Appl. Artif. Intell..

[3]  Andrew M. Sutton,et al.  PSO and multi-funnel landscapes: how cooperation might limit exploration , 2006, GECCO.

[4]  Amit Konar,et al.  Two improved differential evolution schemes for faster global search , 2005, GECCO '05.

[5]  Janez Brest,et al.  Genetic algorithm with advanced mechanisms applied to the protein structure prediction in a hydrophobic-polar model and cubic lattice , 2016, Appl. Soft Comput..

[6]  Tatyana E. Shubina,et al.  A topical collection on the occasion of Tim Clark’s 65th birthday , 2015, Journal of Molecular Modeling.

[7]  Bart Naudts,et al.  A comparison of predictive measures of problem difficulty in evolutionary algorithms , 2000, IEEE Trans. Evol. Comput..

[8]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[9]  Andries Petrus Engelbrecht,et al.  A survey of techniques for characterising fitness landscapes and some possible ways forward , 2013, Inf. Sci..

[10]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[11]  Jaya Sil,et al.  Improved Bees Algorithm for Protein Structure Prediction Using AB Off-Lattice Model , 2015, MENDEL.

[12]  Saman K. Halgamuge,et al.  On the selection of fitness landscape analysis metrics for continuous optimization problems , 2014, 7th International Conference on Information and Automation for Sustainability.

[13]  Leonardo Vanneschi,et al.  A Study of Fitness Distance Correlation as a Difficulty Measure in Genetic Programming , 2005, Evolutionary Computation.

[14]  Luís C. Lamb,et al.  Three-dimensional protein structure prediction: Methods and computational strategies , 2014, Comput. Biol. Chem..

[15]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[16]  Amouda Venkatesan,et al.  Computational Approach for Protein Structure Prediction , 2013, Healthcare informatics research.

[17]  Paul Bratley,et al.  Algorithm 659: Implementing Sobol's quasirandom sequence generator , 1988, TOMS.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[20]  E. Weinberger,et al.  Correlated and uncorrelated fitness landscapes and how to tell the difference , 1990, Biological Cybernetics.

[21]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[22]  Madhu Chetty,et al.  Clustered Memetic Algorithm With Local Heuristics for Ab Initio Protein Structure Prediction , 2013, IEEE Transactions on Evolutionary Computation.

[23]  Head-Gordon,et al.  Toy model for protein folding. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[24]  Marcus Gallagher,et al.  Sampling Techniques and Distance Metrics in High Dimensional Continuous Landscape Analysis: Limitations and Improvements , 2014, IEEE Transactions on Evolutionary Computation.

[25]  Anders Irbäck,et al.  Local Interactions and Protein Folding: A 3D Off-Lattice Approach , 1997 .

[26]  Changjun Zhou,et al.  Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm , 2015, Journal of Molecular Modeling.

[27]  Rafael S. Parpinelli,et al.  Performance Analysis of Swarm Intelligence Algorithms for the 3D-AB off-lattice Protein Folding Problem , 2014, J. Multiple Valued Log. Soft Comput..

[28]  Rafael S. Parpinelli,et al.  Population-based harmony search using GPU applied to protein structure prediction , 2014, Int. J. Comput. Sci. Eng..

[29]  L. Darrell Whitley,et al.  The dispersion metric and the CMA evolution strategy , 2006, GECCO.

[30]  R. Apweiler,et al.  On the Importance of Comprehensible Classification Models for Protein Function Prediction , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Yang Wang,et al.  Chaotic Artificial Bee Colony algorithm: A new approach to the problem of minimization of energy of the 3D protein structure , 2013, Molecular Biology.

[32]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[33]  Ponnuthurai N. Suganthan,et al.  Recent advances in differential evolution - An updated survey , 2016, Swarm Evol. Comput..

[34]  Xin Chen,et al.  An Improved Particle Swarm Optimization for Protein Folding Prediction , 2011 .

[35]  Raymond Chiong,et al.  A balance-evolution artificial bee colony algorithm for protein structure optimization based on a three-dimensional AB off-lattice model , 2015, Comput. Biol. Chem..

[36]  Gang Li,et al.  Heuristic-based tabu search algorithm for folding two-dimensional AB off-lattice model proteins , 2013, Comput. Biol. Chem..

[37]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[38]  Raymond Ros,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup , 2009 .

[39]  Julian Francis Miller,et al.  Information Characteristics and the Structure of Landscapes , 2000, Evolutionary Computation.

[40]  Colin R. Reeves,et al.  Genetic Algorithms: Principles and Perspectives: A Guide to Ga Theory , 2002 .

[41]  Head-Gordon,et al.  Collective aspects of protein folding illustrated by a toy model. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[42]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[43]  Xiaolong Zhang,et al.  Protein structure prediction with local adjust tabu search algorithm , 2014, BMC Bioinformatics.

[44]  Heitor Silvério Lopes,et al.  A differential evolution approach for protein structure optimisation using a 2D off-lattice model , 2010, Int. J. Bio Inspired Comput..

[45]  Yuval Davidor,et al.  Epistasis Variance: Suitability of a Representation to Genetic Algorithms , 1990, Complex Syst..

[46]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[47]  Vesselin K. Vassilev,et al.  Smoothness, ruggedness and neutrality of fitness landscapes: from theory to application , 2003 .

[48]  Saman K. Halgamuge,et al.  Exploratory Landscape Analysis of Continuous Space Optimization Problems Using Information Content , 2015, IEEE Transactions on Evolutionary Computation.

[49]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[50]  Erik van Dijk,et al.  Coarse-grained versus atomistic simulations: realistic interaction free energies for real proteins , 2014, Bioinform..

[51]  Duc Truong Pham,et al.  The Bees Algorithm: Modelling foraging behaviour to solve continuous optimization problems , 2009 .

[52]  Duc Truong Pham,et al.  Benchmarking and comparison of nature-inspired population-based continuous optimisation algorithms , 2014, Soft Comput..