Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization

Protein interactions are central to all the biological processes and structural scaffolds in living organisms, because they orchestrate a number of cellular processes such as metabolic pathways and immunological recognition. Several high-throughput methods, for example, yeast two-hybrid system and mass spectrometry method, can help determine protein interactions, which, however, suffer from high false-positive rates. Moreover, many protein interactions predicted by one method are not supported by another. Therefore, computational methods are necessary and crucial to complete the interactome expeditiously. In this work, we formulate the problem of predicting protein interactions from a new mathematical perspective--sparse matrix completion, and propose a novel nonnegative matrix factorization (NMF)-based matrix completion approach to predict new protein interactions from existing protein interaction networks. Through using manifold regularization, we further develop our method to integrate different biological data sources, such as protein sequences, gene expressions, protein structure information, etc. Extensive experimental results on four species, Saccharomyces cerevisiae, Drosophila melanogaster, Homo sapiens, and Caenorhabditis elegans, have shown that our new methods outperform related state-of-the-art protein interaction prediction methods.

[1]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[2]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[3]  J. Thornton,et al.  Protein–protein interfaces: Analysis of amino acid conservation in homodimers , 2001, Proteins.

[4]  D. Frishman,et al.  A domain interaction map based on phylogenetic profiling. , 2004, Journal of molecular biology.

[5]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[6]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.

[7]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[8]  B. Frey,et al.  Integrating high-throughput genetic interaction mapping and high-content screening to explore yeast spindle morphogenesis , 2010, The Journal of cell biology.

[9]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[10]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[11]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[12]  Raja Jothi,et al.  Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. , 2006, Journal of molecular biology.

[13]  E. Marcotte,et al.  Predicting functional linkages from gene fusions with confidence. , 2002, Applied bioinformatics.

[14]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[15]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Feiping Nie,et al.  Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization , 2011, SIGIR.

[17]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[18]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[21]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[22]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[25]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[26]  E. Sprinzak,et al.  Utilizing logical relationships in genomic data to decipher cellular processes , 2005, The FEBS journal.

[27]  D. Eisenberg,et al.  Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach , 2003, Genome Biology.

[28]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[29]  C. DeLisi,et al.  Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[31]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[32]  Chris H. Q. Ding,et al.  Non-negative Laplacian Embedding , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[34]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[35]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[36]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[37]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[38]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[39]  William Stafford Noble,et al.  A structural alignment kernel for protein structures , 2007, Bioinform..

[40]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[41]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[42]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[43]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[45]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[46]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[47]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[48]  R. Russell,et al.  The relationship between sequence and interaction divergence in proteins. , 2003, Journal of molecular biology.

[49]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[50]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[51]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[52]  Thomas Madej,et al.  Evolutionary plasticity of protein families: Coupling between sequence and structure variation , 2005, Proteins.

[53]  S. Hubbard,et al.  Conservation of orientation and sequence in protein domain--domain interactions. , 2005, Journal of molecular biology.

[54]  William Stafford Noble,et al.  Large-scale prediction of protein-protein interactions from structures , 2010, BMC Bioinformatics.

[55]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[56]  D. Eisenberg,et al.  Use of Logic Relationships to Decipher Protein Network Organization , 2004, Science.

[57]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[58]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[59]  Olgica Milenkovic Wei Dai,et al.  Low‐Rank Matrix Completion for Inference of Protein‐Protein Interaction Networks , 2010 .

[60]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[61]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[62]  S. Teichmann The constraints protein-protein interactions place on sequence divergence. , 2002, Journal of molecular biology.

[63]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.