To migrate or to wait: Bandwidth-latency tradeoff in opportunistic scheduling of parallel tasks

We consider the problem of scheduling low-priority tasks onto resources already assigned to high-priority tasks. Due to burstiness of the high-priority workloads, the resources can be temporarily underutilized and made available to the low-priority tasks. The increased level of utilization comes at a cost to the low-priority tasks due to intermittent resource availability. Focusing on two major costs, bandwidth cost associated with migrating tasks and latency cost associated with suspending tasks, we aim at developing online scheduling policies achieving the optimal bandwidth-latency tradeoff for parallel low-priority tasks with synchronization requirements. Under Markovian resource availability models, we formulate the problem as a Markov Decision Process (MDP) whose solution gives the optimal scheduling policy. Furthermore, we discover structures of the problem in the special case of homogeneous availability patterns that enable a simple threshold-based policy that is provably optimal. We validate the efficacy of the proposed policies by trace-driven simulations.