Adding Probabilistic Dependencies to the Search of Protein Side Chain Configurations Using EDAs

The problem of finding an optimal positioning for the side chain residues of a protein is called the side chain placement or side chain prediction problem. It can be posed as an optimization problem in the discrete domain. In this paper we use an estimation of distribution algorithm to address this optimization problem. Using a set of 50 difficult protein instances, it is shown that the addition of dependencies between the variables in the probabilistic model can improve the quality of the solutions achieved for most of the instances considered. However, we also show that only when information about the known interactions between the residues is considered in the creation of the probabilistic model, the addition of the dependencies contributes to improve the quality of the solutions obtained.

[1]  Pedro Larrañaga,et al.  The Role of a Priori Information in the Minimization of Contact Potentials by Means of Estimation of Distribution Algorithms , 2007, EvoBIO.

[2]  Hans-Paul Schwefel,et al.  Parallel Problem Solving from Nature — PPSN IV , 1996, Lecture Notes in Computer Science.

[3]  Patrice Koehl,et al.  Building protein lattice models using self-consistent mean field theory , 1998 .

[4]  Pedro Larrañaga,et al.  Combining variable neighborhood search and estimation of distribution algorithms in the protein side chain placement problem , 2007, J. Heuristics.

[5]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[6]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[7]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[8]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[9]  Xavier Llorà,et al.  ENPDA: an evolutionary structure-based de novo peptide design algorithm , 2005, J. Comput. Aided Mol. Des..

[10]  Pedro Larrañaga,et al.  Side chain placement using estimation of distribution algorithms , 2007, Artif. Intell. Medicine.

[11]  J. Hsu Multiple Comparisons: Theory and Methods , 1996 .

[12]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[13]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[14]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[15]  R. Santana,et al.  The mixture of trees Factorized Distribution Algorithm , 2001 .

[16]  Marc De Maeyer,et al.  The Dead-End Elimination Theorem: , 2000 .

[17]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[18]  Adrian A Canutescu,et al.  A graph‐theory algorithm for rapid protein side‐chain prediction , 2003, Protein science : a publication of the Protein Society.

[19]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[20]  Pedro Larrañaga,et al.  Exact Bayesian network learning in estimation of distribution algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[21]  Rajkumar Roy,et al.  Advances in Soft Computing: Engineering Design and Manufacturing , 1998 .

[22]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[23]  J. A. Lozano,et al.  Towards a New Evolutionary Computation: Advances on Estimation of Distribution Algorithms (Studies in Fuzziness and Soft Computing) , 2006 .

[24]  M. Pelikán,et al.  The Bivariate Marginal Distribution Algorithm , 1999 .

[25]  Pedro Larrañaga,et al.  Protein Folding in Simplified Models With Estimation of Distribution Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[26]  Shumeet Baluja,et al.  Using Optimal Dependency-Trees for Combinational Optimization , 1997, ICML.