A machine learning approach for the identification of odorant binding proteins from sequence-derived properties

BackgroundOdorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.ResultsIn this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively).ConclusionOur study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.

[1]  G. Prestwich,et al.  Protein structure encodes the ligand binding specificity in pheromone binding proteins. , 1995, Biochemistry.

[2]  P. Pelosi,et al.  Perireceptor events in olfaction. , 1996, Journal of neurobiology.

[3]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[4]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[5]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[6]  T. Rabbitts,et al.  Molecular cloning of putative odorant-binding and odorant-metabolizing proteins. , 1991, Biochemistry.

[7]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  B. Ache,et al.  Towards a common strategy for transducing olfactory information. , 1994, Seminars in cell biology.

[10]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[11]  Ramanathan Sowdhamini,et al.  GenDiS: Genomic Distribution of protein structural domain Superfamilies , 2004, Nucleic Acids Res..

[12]  Y. Liou,et al.  Characterization and cloning of a Tenebrio molitor hemolymph protein with sequence similarity to insect odorant-binding proteins. , 2001, Insect Biochemistry and Molecular Biology.

[13]  H. Breer,et al.  Primary structure of a pheromone-binding protein from Antheraea pernyi: homologies with other ligand-carrying proteins , 2004, Journal of Comparative Physiology B.

[14]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[15]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[16]  R. Vogt,et al.  Functional and expression pattern analysis of chemosensory proteins expressed in antennae and pheromonal gland of Mamestra brassicae. , 2001, Chemical senses.

[17]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[18]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[19]  H Breer,et al.  Cloning of genomic and complementary DNA encoding insect pheromone binding proteins: evidence for microdiversity. , 1991, Biochimica et biophysica acta.

[20]  L. Riddiford,et al.  Pheromone binding and inactivation by moth antennae , 1981, Nature.

[21]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[22]  R. Vogt,et al.  Odorant binding protein diversity and distribution among the insect orders, as indicated by LAP, an OBP-related protein of the true bug Lygus lineolaris (Hemiptera, Heteroptera). , 1999, Chemical senses.

[23]  C. Masson,et al.  Separation, characterization and sexual heterogeneity of multiple putative odorant-binding proteins in the honeybee Apis mellifera L. (Hymenoptera: Apidea). , 1998, Chemical senses.

[24]  Dmitrij Frishman,et al.  Will my protein crystallize? A sequence‐based predictor , 2005, Proteins.

[25]  R. Vogt,et al.  Expression of pheromone binding proteins during antennal development in the gypsy moth Lymantria dispar , 1989, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[26]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[27]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[28]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[29]  V. A. Filippov,et al.  Sericotropin: an Insect Neurohormonal Factor Affecting Rna Transcription , 1994 .

[30]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[31]  P. Pelosi,et al.  Multiple types and forms of odorant-binding proteins in the Old-World porcupine Hystrix cristata. , 1993, Comparative biochemistry and physiology. B, Comparative biochemistry.

[32]  P. Pelosi,et al.  Odorant-binding proteins. , 1994, Critical reviews in biochemistry and molecular biology.

[33]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[34]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[35]  G. Shepherd,et al.  Mechanisms of olfactory discrimination: converging evidence for common principles across phyla. , 1997, Annual review of neuroscience.

[36]  R. Axel,et al.  A novel multigene family may encode odorant receptors: A molecular basis for odor recognition , 1991, Cell.

[37]  Murray B. Isman,et al.  Analysis of the Insect OS-D-Like Gene Family , 2004, Journal of Chemical Ecology.

[38]  K. Kaissling,et al.  Pheromone deactivation catalyzed by receptor molecules: a quantitative kinetic model. , 1998, Chemical senses.

[39]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[40]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[41]  Jonathan Pevsner,et al.  The three-dimensional structure of bovine odorant binding protein and its mechanism of odor recognition , 1996, Nature Structural Biology.

[42]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[43]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[44]  H Breer,et al.  Cloning and expression of odorant-binding proteins Ia and Ib from mouse nasal tissue. , 1998, Gene.

[45]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[46]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[47]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.