Reducing Job Slowdown Variability for Data-Intensive Workloads

Bogdan Ghit, Dick Epema

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)
54 Downloads (Pure)

Abstract

A well-known problem when executing data-intensive workloads with such frameworks as MapReduce is that small jobs with processing requirements counted in the minutes may suffer from the presence of huge jobs requiring hours or days of compute time, leading to a job slowdown distribution that is very variable and that is uneven across jobs of different sizes. Previous solutions to this problem for sequential or rigid jobs in single-server and distributed-server systems include priority-based FeedBack Queueing (FBQ), and Task Assignment by Guessing Sizes (TAGS), which kills and restarts from scratch on another server jobs that exceed the local time limit. In this paper, we derive four scheduling policies that are rightful descendants of existing size-based scheduling disciplines (among which FBQ and TAGS) with appropriate adaptations to data-intensive frameworks. The two main mechanisms employed by these policies are partitioning the resources of the datacenter, and isolating jobs with different size ranges. We evaluate these policies by means of realistic simulations of representative MapReduce workloads from Facebook and show that under the best of these policies, the vast majority of short jobs in MapReduce workloads experience close to ideal job slowdowns even under high system loads (in the range of 0.7-0.9) while the slowdown of the very large jobs is not prohibitive. We validate our simulations by means of experiments on a real multicluster system, and we find that the job slowdown performance results obtained with both match remarkably well.
Original languageEnglish
Title of host publicationIEEE 23rd Int'l Symp. on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
Pages61 - 68
Number of pages10
DOIs
Publication statusPublished - 1 Oct 2015

Fingerprint

Dive into the research topics of 'Reducing Job Slowdown Variability for Data-Intensive Workloads'. Together they form a unique fingerprint.

Cite this