Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization

Due to the high false positive rate in the high-throughput experimental methods to discover protein interactions, computational methods are necessary and crucial to complete the interactome expeditiously. However, when building classification models to identify putative protein interactions, compared to the obvious choice of positive samples from truly interacting protein pairs, it is usually very hard to select negative samples, because non-interacting protein pairs refer to those currently without experimental or computational evidence to support a physical interaction or a functional association, which, though, could interact in reality. To tackle this difficulty, instead of using heuristics as in many existing works, in this paper we solve it in a principled way by formulating the protein interaction prediction problem from a new mathematical perspective of view - sparse matrix completion, and propose a novel Nonnegative Matrix Tri-Factorization (NMTF) based matrix completion approach to predict new protein interactions from existing protein interaction networks. Because matrix completion only requires positive samples but not use negative samples, the challenge in existing classification based methods for protein interaction prediction is circumvented. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on Saccharomyces cerevisiae genome show that our new methods outperform related state-of-the-art protein interaction prediction methods.

[1]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[4]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[8]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[10]  C. DeLisi,et al.  Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[12]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[13]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[14]  E. Marcotte,et al.  Predicting functional linkages from gene fusions with confidence. , 2002, Applied bioinformatics.

[15]  S. Teichmann The constraints protein-protein interactions place on sequence divergence. , 2002, Journal of molecular biology.

[16]  D. Eisenberg,et al.  Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach , 2003, Genome Biology.

[17]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[18]  R. Russell,et al.  The relationship between sequence and interaction divergence in proteins. , 2003, Journal of molecular biology.

[19]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[20]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[21]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[22]  D. Frishman,et al.  A domain interaction map based on phylogenetic profiling. , 2004, Journal of molecular biology.

[23]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[24]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[25]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[26]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[27]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[28]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[29]  E. Sprinzak,et al.  Utilizing logical relationships in genomic data to decipher cellular processes , 2005, The FEBS journal.

[30]  S. Hubbard,et al.  Conservation of orientation and sequence in protein domain--domain interactions. , 2005, Journal of molecular biology.

[31]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[32]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[33]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[34]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[35]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[36]  William Stafford Noble,et al.  A structural alignment kernel for protein structures , 2007, Bioinform..

[37]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  Chris H. Q. Ding,et al.  Non-negative Laplacian Embedding , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[39]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[40]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[41]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[42]  William Stafford Noble,et al.  Large-scale prediction of protein-protein interactions from structures , 2010, BMC Bioinformatics.

[43]  Olgica Milenkovic Wei Dai,et al.  Low‐Rank Matrix Completion for Inference of Protein‐Protein Interaction Networks , 2010 .

[44]  B. Frey,et al.  Integrating high-throughput genetic interaction mapping and high-content screening to explore yeast spindle morphogenesis , 2010, The Journal of cell biology.

[45]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[46]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[47]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[48]  Feiping Nie,et al.  Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization , 2011, SIGIR.

[49]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[50]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.