Function-Function Correlated Multi-Label Protein Function Prediction over Interaction Networks

Many previous computational methods for protein function prediction make prediction one function at a time, fundamentally, which is equivalent to assume the functional categories of proteins to be isolated. However, biological processes are highly correlated and usually intertwined together to happen at the same time, therefore it would be beneficial to consider protein function prediction as one indivisible task and treat all the functional categories as an integral and correlated prediction target. By leveraging the function-function correlations, it is expected to achieve improved overall predictive accuracy. To this end, we develop a novel network based protein function prediction approach, under the framework of multi-label classification in machine learning, to utilize the function-function correlations. Besides formulating the function-function correlations in the optimization objective explicitly, we also exploit them as part of the pairwise protein-protein similarities implicitly. The algorithm is built upon the Green's function over a graph, which not only employs the global topology of a network but also captures its local structural information. We evaluate the proposed approach on Saccharomyces cerevisiae species. The encouraging experimental results demonstrate the effectiveness of the proposed method.

[1]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[2]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[3]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[4]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[5]  Chris H. Q. Ding,et al.  Image annotation using bi-relational graph of images and semantic labels , 2011, CVPR 2011.

[6]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[7]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[8]  Chris H. Q. Ding,et al.  A learning framework using Green's function and kernel regularization with application to recommender system , 2007, KDD '07.

[9]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[10]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[11]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[12]  Chris H. Q. Ding,et al.  Directed Graph Learning via High-Order Co-linkage Analysis , 2010, ECML/PKDD.

[13]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[14]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[15]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[16]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[17]  Chris H. Q. Ding,et al.  Image annotation using multi-label correlated Green's function , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[21]  Chris H. Q. Ding,et al.  Multi-Label Classification: Inconsistency and Class Balanced K-Nearest Neighbor , 2010, AAAI.

[22]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[23]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[24]  M. Randic,et al.  Resistance distance , 1993 .

[25]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[27]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[28]  Chris H. Q. Ding,et al.  Multi-label Feature Transform for Image Classifications , 2010, ECCV.

[29]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[30]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.