论文信息 - Improving Multi-job MapReduce Scheduling in an Opportunistic Environment

Improving Multi-job MapReduce Scheduling in an Opportunistic Environment

As a state-of-the-art programming model for big data analytics, MapReduce is well suited for parallel processing of large data sets in opportunistic environments. Existing research on MapReduce in opportunistic environment has focused on improving single job performance, the issue of fairness that is critical in the more dominant scenario of multiple concurrent jobs remains unexplored. We address this problem by proposing an opportunistic fair scheduling algorithm, which extends the broadly adopted Fair Scheduler to an environment where nodes are intermittently available with possibly different availability patterns. The proposed scheduler maintains statistics specific to the opportunistic environment, e.g., node availability rates and pairwise availability correlations, and utilizes this information in scheduling decisions to improve fairness. Using a Hadoop-based implementation, we compare our scheduler with the current Hadoop Fair Scheduler on representative benchmarks. Our experiments verify that our scheduler can significantly reduce the variability in job completion times.

[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2] Matei Zaharia,et al. Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[3] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[4] Thomas Sandholm,et al. Dynamic Proportional Share Scheduling in Hadoop , 2010, JSSPP.

[5] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[6] Kyungyong Lee,et al. MapReduce on opportunistic resources leveraging resource availability , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[7] Lang Tong,et al. Scheduling Parallel Tasks onto Opportunistically Available Cloud Resources , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[8] Wu-chun Feng,et al. Reliable MapReduce computing on opportunistic resources , 2011, Cluster Computing.

[9] Xian-He Sun,et al. ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[10] Wu-chun Feng,et al. MOON: MapReduce On Opportunistic eNvironments , 2010, HPDC '10.