MLI: An API for Distributed Machine Learning

MLI is an Application Programming Interface designed to address the challenges of building Machine Learning algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[5]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[6]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[7]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[8]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  MLbase : A Distributed Machine Learning Wrapper , 2012 .

[11]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[12]  Neoklis Polyzotis,et al.  Scaling Datalog for Machine Learning on Big Data , 2012, ArXiv.

[13]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[14]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[15]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[16]  William B. March,et al.  MLPACK: a scalable C++ machine learning library , 2012, J. Mach. Learn. Res..

[17]  Alvin AuYoung,et al.  Presto: distributed machine learning and graph processing with sparse matrices , 2013, EuroSys '13.

[18]  Christopher Ré,et al.  Hazy: making it easier to build and maintain big-data analytics , 2013, CACM.