Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data

MOTIVATION High-throughput protein interaction data, with ever-increasing volume, are becoming the foundation of many biological discoveries, and thus high-quality protein-protein interaction (PPI) maps are critical for a deeper understanding of cellular processes. However, the unreliability and paucity of current available PPI data are key obstacles to the subsequent quantitative studies. It is therefore highly desirable to develop an approach to deal with these issues from the computational perspective. Most previous works for assessing and predicting protein interactions either need supporting evidences from multiple information resources or are severely impacted by the sparseness of PPI networks. RESULTS We developed a robust manifold embedding technique for assessing the reliability of interactions and predicting new interactions, which purely utilizes the topological information of PPI networks and can work on a sparse input protein interactome without requiring additional information types. After transforming a given PPI network into a low-dimensional metric space using manifold embedding based on isometric feature mapping (ISOMAP), the problem of assessing and predicting protein interactions is recasted into the form of measuring similarity between points of its metric space. Then a reliability index, a likelihood indicating the interaction of two proteins, is assigned to each protein pair in the PPI networks based on the similarity between the points in the embedded space. Validation of the proposed method is performed with extensive experiments on densely connected and sparse PPI network of yeast, respectively. Results demonstrate that the interactions ranked top by our method have high-functional homogeneity and localization coherence, especially our method is very efficient for large sparse PPI network with which the traditional algorithms fail. Therefore, the proposed algorithm is a much more promising method to detect both false positive and false negative interactions in PPI networks. AVAILABILITY MATLAB code implementing the algorithm is available from the web site http://home.ustc.edu.cn/∼yzh33108/Manifold.htm.

[1]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[2]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[3]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, CVPR.

[4]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[5]  Mong-Li Lee,et al.  Increasing confidence of protein-protein interactomes. , 2006, Genome informatics. International Conference on Genome Informatics.

[6]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  Hongbin Zha,et al.  Riemannian Manifold Learning for Nonlinear Dimensionality Reduction , 2006, ECCV.

[9]  Limsoon Wong,et al.  Author's Personal Copy Increasing the Reliability of Protein Interactomes , 2022 .

[10]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[11]  Martin Ester,et al.  Dense Graphlet Statistics of Protein Interaction and Random Networks , 2009, Pacific Symposium on Biocomputing.

[12]  Guimei Liu,et al.  Protein Interactome Analysis for Countering Pathogen Drug Resistance , 2010, Journal of Computer Science and Technology.

[13]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[14]  Anil K. Jain,et al.  Incremental nonlinear dimensionality reduction by manifold learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Giorgio Gallo,et al.  Shortest path algorithms , 1988 .

[16]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.

[17]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[18]  Pierre Legrain,et al.  Biochemical Characterization of Protein Complexes from the Helicobacter pylori Protein Interaction Map , 2004, Molecular & Cellular Proteomics.

[19]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[20]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[21]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[22]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[23]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[24]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[25]  Feiping Nie,et al.  Nonlinear Dimensionality Reduction with Local Spline Embedding , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[27]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[28]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[29]  Jian Wang,et al.  PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources*S , 2008, Molecular & Cellular Proteomics.

[30]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[31]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Guimei Liu,et al.  Assessing and predicting protein interactions using both local and global network topological metrics. , 2008 .

[34]  Mong-Li Lee,et al.  Discovering reliable protein interactions from high-throughput experimental data using network topology , 2005, Artif. Intell. Medicine.

[35]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[36]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[37]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[38]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..