Function-Function Correlated Multi-label Protein Function Prediction over Interaction Networks

Many previous works in protein function prediction make predictions one function at a time, fundamentally, which assumes the functional categories to be isolated. However, biological processes are highly correlated and usually intertwined together to happen at the same time; therefore, it would be beneficial to consider protein function prediction as one indivisible task and treat all the functional categories as an integral and correlated prediction target. By leveraging the function-function correlations, it is expected to achieve improved overall predictive accuracy. To this end, we develop a network-based protein function prediction approach, under the framework of multi-label classification in machine learning, to utilize the function-function correlations. Besides formulating the function-function correlations in the optimization objective explicitly, we also exploit them as part of the pairwise protein-protein similarities implicitly. The algorithm is built upon the Green's function over a graph, which not only employs the global topology of a network but also captures its local structures. In addition, we propose an adaptive decision boundary method to deal with the unbalanced distribution of protein annotation data. Finally, we quantify the statistical confidence of predicted functions to facilitate post-processing of proteomic analysis. We evaluate the proposed approach on Saccharomyces cerevisiae data, and the experimental results demonstrate very encouraging results.

[1]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[3]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[4]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[5]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[6]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[7]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[8]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[9]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[10]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[13]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[14]  Limsoon Wong,et al.  Exploiting indirect neighbours and topological weight to predict protein function from protein--protein interactions , 2006 .

[15]  Chris H. Q. Ding,et al.  Multi-Label Classification: Inconsistency and Class Balanced K-Nearest Neighbor , 2010, AAAI.

[16]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[17]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[18]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.