论文信息 - Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters

Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters

Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. We argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning. In particular, we apply rich statistical models of the application's performance, simulation-based methods for finding an optimal control policy, and change-point methods to find abrupt changes in performance. Preliminary results running aWeb 2.0 benchmark application driven by real workload traces on Amazon's EC2 cloud show that our method can effectively control the number of servers, even in the face of performance anomalies.

[1] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[2] Amin Vahdat,et al. Managing energy and server resources in hosting centers , 2001, SOSP.

[3] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[4] Prashant J. Shenoy,et al. Dynamic Provisioning of Multi-tier Internet Applications , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[5] Lui Sha,et al. Adaptive Control of Multi-Tiered Web Applications Using Queueing Predictor , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[6] Rajarshi Das,et al. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation , 2006, 2006 IEEE International Conference on Autonomic Computing.

[7] A. Fox,et al. Cloudstone : Multi-Platform , Multi-Language Benchmark and Measurement Tools for Web 2 . 0 , 2008 .

[8] Nagarajan Kandasamy,et al. Power and performance management of virtualized computing environments via lookahead control , 2008, 2008 International Conference on Autonomic Computing.

[9] J. Hellerstein,et al. Optimizing Concurrency Levels in the . NET ThreadPool : A Case Study of Controller Design and Implementation , 2008 .

[10] Prashant J. Shenoy,et al. Agile dynamic provisioning of multi-tier Internet applications , 2008, TAAS.