Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency

Conventional computational approaches for protein function prediction usually predict one function at a time, fundamentally. As a result, the protein functions are treated as separate target classes. However, biological processes are highly correlated in reality, which makes multiple functions assigned to a protein not independent. Therefore, it would be beneficial to make use of function category correlations when predicting protein functions. In this article, we propose a novel Maximization of Data-Knowledge Consistency (MDKC) approach to exploit function category correlations for protein function prediction. Our approach banks on the assumption that two proteins are likely to have large overlap in their annotated functions if they are highly similar according to certain experimental data. We first establish a new pairwise protein similarity using protein annotations from knowledge perspective. Then by maximizing the consistency between the established knowledge similarity upon annotations and the data similarity upon biological experiments, putative functions are assigned to unannotated proteins. Most importantly, function category correlations are gracefully incorporated into our learning objective through the knowledge similarity. Comprehensive experimental evaluations on the Saccharomyces cerevisiae species have demonstrated promising results that validate the performance of our methods.

[1]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[2]  Limsoon Wong,et al.  Using indirect protein interactions for the prediction of Gene Ontology functions , 2007, BMC Bioinformatics.

[3]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[4]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.

[5]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[6]  Chris H. Q. Ding,et al.  Protein Function Prediction via Laplacian Network Partitioning Incorporating Function Category Correlations , 2013, IJCAI.

[7]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[8]  Jieping Ye,et al.  Adaptive diffusion kernel learning from biological networks for protein function prediction , 2008, BMC Bioinformatics.

[9]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[10]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[11]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[12]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[13]  Andreas Martin Lisewski,et al.  Graph sharpening plus graph integration: a synergy that improves protein functional classification , 2007, Bioinform..

[14]  Jason Weston,et al.  Protein ranking: from local to global structure in the protein similarity network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[16]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[17]  Xiaoyu Jiang,et al.  Integration of relational and hierarchical network information for protein function prediction , 2008, BMC Bioinformatics.

[18]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[19]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[20]  Simon Kasif,et al.  Biological Process Linkage Networks , 2009, PloS one.

[21]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[22]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[24]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[25]  Chris H. Q. Ding,et al.  Multi-label Feature Transform for Image Classifications , 2010, ECCV.

[26]  Chris H. Q. Ding,et al.  Image annotation using bi-relational graph of images and semantic labels , 2011, CVPR 2011.

[27]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[28]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[29]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[30]  Chris H. Q. Ding,et al.  Function-Function Correlated Multi-Label Protein Function Prediction over Interaction Networks , 2012, RECOMB.

[31]  Lei Shi,et al.  ANN Based Protein Function Prediction Using Integrated Protein-Protein Interaction Data , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[32]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[33]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[34]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[35]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Chris H. Q. Ding,et al.  Image annotation using multi-label correlated Green's function , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[39]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[40]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.