DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction

The knowledge of end-to-end network distances is essential to many Internet applications. As active probing of all pairwise distances is infeasible in large-scale networks, a natural idea is to measure a few pairs and to predict the other ones without actually measuring them. This paper formulates the prediction problem as matrix completion where the unknown entries in a pairwise distance matrix constructed from a network are to be predicted. By assuming that the distance matrix has low-rank characteristics, the problem is solvable by low-rank approximation based on matrix factorization. The new formulation circumvents the well-known drawbacks of existing approaches based on Euclidean embedding. A new algorithm, so-called Decentralized Matrix Factorization by Stochastic Gradient Descent (DMFSGD), is proposed. By letting network nodes exchange messages with each other, the algorithm is fully decentralized and only requires each node to collect and to process local measurements, with neither explicit matrix constructions nor special nodes such as landmarks and central servers. In addition, we compared comprehensively matrix factorization and Euclidean embedding to demonstrate the suitability of the former on network distance prediction. We further studied the incorporation of a robust loss function and of nonnegativity constraints. Extensive experiments on various publicly available datasets of network delays show not only the scalability and the accuracy of our approach, but also its usability in real Internet applications.

[1]  Jon Crowcroft,et al.  On the accuracy of embeddings for internet coordinate systems , 2005, IMC '05.

[2]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[3]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Math. Program. Comput..

[4]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[5]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[6]  Margo I. Seltzer,et al.  Network Coordinates in the Wild , 2007, NSDI.

[7]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[9]  Pierre Geurts,et al.  Decentralized prediction of end-to-end network performance classes , 2011, CoNEXT '11.

[10]  Benoit Donnet,et al.  A Survey on Network Coordinates Systems, Design, and Security , 2010, IEEE Communications Surveys & Tutorials.

[11]  C. Hennig,et al.  Some thoughts about the design of loss functions , 2007 .

[12]  T. S. Eugene Ng,et al.  Distributed algorithms for stable and secure network coordinates , 2008, IMC '08.

[13]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[14]  Jonathan M. Smith,et al.  IDES: An Internet Distance Estimation Service for Large Networks , 2006, IEEE Journal on Selected Areas in Communications.

[15]  Andrew W. Fitzgibbon,et al.  Damped Newton algorithms for matrix factorization with missing data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[17]  David Mazières,et al.  OASIS: Anycast for Any Service , 2006, NSDI.

[18]  Pierre Geurts,et al.  Network Distance Prediction Based on Decentralized Matrix Factorization , 2010, Networking.

[19]  Xiao Wang,et al.  Phoenix: Towards an Accurate, Practical and Decentralized Network Coordinate System , 2009, Networking.

[20]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[21]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[22]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[23]  Eng Keong Lua,et al.  Internet Routing Policies and Round-Trip-Times , 2005, PAM.

[24]  Bo Zhang,et al.  Towards network triangle inequality violation aware distributed systems , 2007, IMC '07.

[25]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[26]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[27]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[28]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[29]  Balachander Krishnamurthy,et al.  Internet Measurement - Infrastructure, Traffic, and Applications , 2006 .

[30]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[31]  Emin Gün Sirer,et al.  Meridian: a lightweight network location service without virtual coordinates , 2005, SIGCOMM '05.

[32]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[33]  Michalis Faloutsos,et al.  On routing asymmetry in the Internet , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[34]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[35]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[36]  Ying Zhang,et al.  A Measurement Study of Internet Delay Asymmetry , 2008, PAM.

[37]  Zhi-Li Zhang,et al.  On suitability of Euclidean embedding of internet hosts , 2006, SIGMETRICS '06/Performance '06.

[38]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[39]  Mark Crovella,et al.  Network Kriging , 2005, IEEE Journal on Selected Areas in Communications.

[40]  Nenghai Yu,et al.  Distributed Hash Table , 2013, SpringerBriefs in Computer Science.

[41]  Sonia Fahmy,et al.  Impact of the Inaccuracy of Distance Prediction Algorithms on Internet Applications - an Analytical and Comparative Study , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[42]  Mark Crovella,et al.  Virtual landmarks for the internet , 2003, IMC '03.

[43]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[44]  Suman Banerjee,et al.  The Interdomain Connectivity of PlanetLab Nodes , 2004, PAM.

[45]  Randy H. Katz,et al.  An algebraic approach to practical and scalable overlay network monitoring , 2004, SIGCOMM '04.

[46]  Robert Morris,et al.  A distributed hash table , 2006 .

[47]  Ohad Shamir,et al.  Large-Scale Convex Minimization with a Low-Rank Constraint , 2011, ICML.