SVMCRYS: an SVM approach for the prediction of protein crystallization propensity from protein sequence.

X-ray crystallography is the most widely used method for protein 3-dimensional structure determination. Selection of target protein that can yield high quality crystal for X-ray crystallography is a challenging task. Prediction of protein crystallization propensity from sequence information is useful for the selection of target protein for crystallization. Recently, support vector machines have been widely used to solve various biological problems. In this work, we present a SVMCRYS method which use support vector machine to classify protein sequence into 'amenable to crystallization' and 'resistant to crystallization'. SVMCRYS was trained on a dataset containing 728 sequences that gave diffraction quality crystal and 728 sequences where work had been stopped before obtaining crystal. The performance of SVMCRYS method was compared with other sequence-based crystallization prediction methods such as SECRET, CRYSTALP, OB-Score, ParCrys and XtalPred using three different datasets. SVMCRYS achieved better prediction rate with higher sensitivity and specificity. Our analysis suggests that SVMCRYS can be used to predict proteins which are amenable to crystallization and proteins which are difficult for crystallization. The SVMCRYS software, dataset and feature set can be obtained from http://www3.ntu.edu.sg/home/EPNSugan/index_files/svmcrys.htm.

[1]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[2]  Leszek Rychlewski,et al.  The challenge of protein structure determination—lessons from structural genomics , 2007, Protein science : a publication of the Protein Society.

[3]  Dmitrij Frishman,et al.  Will my protein crystallize? A sequence‐based predictor , 2005, Proteins.

[4]  N. Chayen,et al.  Towards a 'universal' nucleant for protein crystallization. , 2009, Trends in biotechnology.

[5]  Ponnuthurai N. Suganthan,et al.  Identification of structurally conserved residues of proteins in absence of structural homologs using neural network ensemble , 2008, Bioinform..

[6]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[7]  Geoffrey J Barton,et al.  A normalised scale for structural genomics target ranking: The OB‐Score , 2006, FEBS letters.

[8]  David Eisenberg,et al.  Toward rational protein crystallization: A Web server for the design of crystallizable protein variants , 2007, Protein science : a publication of the Protein Society.

[9]  Mark A. Girolami,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn055 Sequence analysis ParCrys: a Parzen window density estimation approach , 2022 .

[10]  H. Yamaguchi,et al.  ‘Crystal lattice engineering,’ an approach to engineer protein crystal contacts by creating intermolecular symmetry: Crystallization and structure determination of a mutant human RNase 1 with a hydrophobic interface of leucines , 2007, Protein science : a publication of the Protein Society.

[11]  Zygmunt S Derewenda,et al.  Rational protein crystallization by mutational surface engineering. , 2004, Structure.

[12]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[13]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[14]  Jürgen Cox,et al.  Predicting experimental properties of proteins from sequence by machine learning techniques. , 2007, Current protein & peptide science.

[15]  Kuo-Chen Chou,et al.  Predicting the affinity of epitope-peptides with class I MHC molecule HLA-A*0201: an application of amino acid-based peptide prediction. , 2007, Protein engineering, design & selection : PEDS.

[16]  D. Kirschner,et al.  Is myelin basic protein crystallizable? , 1992, Neurochemical Research.

[17]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[18]  Ponnuthurai N. Suganthan,et al.  A machine learning approach for the identification of odorant binding proteins from sequence-derived properties , 2007, BMC Bioinformatics.

[19]  P. Suganthan,et al.  Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. , 2008, Biochemical and biophysical research communications.

[20]  Ursula Egner,et al.  Identifying protein construct variants with increased crystallization propensity––A case study , 2006, Protein science : a publication of the Protein Society.

[21]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[22]  A. McPherson Crystallization of Biological Macromolecules , 1999 .

[23]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[24]  Mark Gerstein,et al.  Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. , 2004, Journal of molecular biology.

[25]  H. F. Fisher,et al.  A LIMITING LAW RELATING THE SIZE AND SHAPE OF PROTEIN MOLECULES TO THEIR COMPOSITION. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[27]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[28]  Leszek Rychlewski,et al.  XtalPred: a web server for prediction of protein crystallizability , 2007, Bioinform..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.