论文信息 - Real-Time Machine Learning: The Missing Pieces - 字舞流文

Real-Time Machine Learning: The Missing Pieces

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.

Michael I. Jordan | Ion Stoica | Stephanie Wang | Richard Liaw | Robert Nishihara | Philipp Moritz | William Paul | Alexey Tumanov | Johann Schleier-Smith | Philipp Moritz | Robert Nishihara | I. Stoica | Stephanie Wang | Alexey Tumanov | William Paul | Johann Schleier-Smith | Richard Liaw

[1] Carl Hewitt,et al. The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.

[2] Joe Armstrong,et al. Concurrent programming in ERLANG , 1993 .

[3] Claes Wikström,et al. Concurrent programming in ERLANG (2nd ed.) , 1996 .

[4] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[7] Anant Agarwal,et al. Factored operating systems (fos): the case for a scalable operating system for multicores , 2009, OPSR.

[8] James R. Larus,et al. Orleans: cloud computing for everyone , 2011, SoCC.

[9] Steven Hand,et al. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[10] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.

[11] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[12] Fernando M. V. Ramos,et al. Software-Defined Networking: A Comprehensive Survey , 2014, Proceedings of the IEEE.

[13] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[14] Matthew Rocklin,et al. Dask: Parallel Computation with Blocked algorithms and Task Scheduling , 2015, SciPy.

[15] Michael I. Jordan,et al. The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox , 2014, CIDR.

[16] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18] Reynold Xin,et al. Apache Spark , 2016 .

[19] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.