Academic Commons

Theses Doctoral

Resource Allocation In Large-Scale Distributed Systems

Shafiee, Mehrnoosh

The focus of this dissertation is design and analysis of scheduling algorithms for distributed computer systems, i.e., data centers. Today’s data centers can contain thousands of servers and typically use a multi-tier switch network to provide connectivity among the servers. Data centers are the host for execution of various data-parallel applications. As an abstraction, a job in a data center can be thought of as a group of interdependent tasks, each with various requirements which need to be scheduled for execution on the servers and the data flows between the tasks that need to be scheduled in the switch network. In this thesis, we study both flow and task scheduling problems under the features of modern parallel computing frameworks.For the flow scheduling problem, we study three models.

The first model considers a general network topology where flows among the various source-destination pairs of servers are generated dynamically over time. The goal is to assign the end-to-end data flows among the available paths in order to efficiently balance the load in the network. We propose a myopic algorithm that is computationally efficient and prove that it asymptotically minimizes the total network cost using a convex optimization model, fluid limit and Lyapunov analysis. We further propose randomized versions of our myopic algorithm.

The second model consider the case that there is dependence among flows. Specifically, a coflow is defined as a collection of parallel flows whose completion time is determined by the completion time of the last flow in the collection. Our main result is a 5-approximation deterministic algorithm that schedule coflows in polynomial time so as to minimize the total weighted completion times. The key ingredient of our approach is an improved linear program formulation for sorting the coflows followed by a simple list scheduling policy.

Lastly, we study scheduling coflows of multi-stage jobs to minimize the jobs’ total weighted completion times. Each job is represented by a DAG (Directed Acyclic Graph) among its coflows that captures the dependencies among the coflows. We define g(m) = log(m)/log(log(m)) and h(m, μ) = log(mμ)/(log(log(mμ)), where m is number of servers, μ is the maximum number of coflows in a job. We develop two algorithms with approximation ratios O(√μg(m)) and O(√μg(m)h(m, μ)) for jobs with general DAGs and rooted trees, respectively. The algorithms rely on random delaying and merging optimal schedules of the coflows in the jobs’ DAG, followed by enforcing dependency among coflows and the links’ capacity constraints.

For the task scheduling problem, we study two models. We consider a setting where each job consists of a set of parallel tasks that need to be processed on different servers, and the job is completed once all its tasks finish processing. In the first model, each job is associated with a utility which is a decreasing function of its completion time. The objective is to schedule tasks in a way that achieves max-min fairness for jobs’ utilities. We first show a strong result regarding NP-hardness of this problem. We then proceed to define two notions of approximation solutions and develop scheduling algorithms that provide guarantees under these approximation notions, using dynamic programming and random perturbation of tasks’ processing times. In the second model, we further assume that processing times of tasks can be server dependent and a server can process (pack) multiple tasks at the same time subject to its capacity. We then propose three algorithms with approximation ratios of 4, (6 + ε), and 24 for different cases where preemption and migration of tasks among the servers are or are not allowed. Our algorithms use a combination of linear program relaxation and greedy packing techniques.
To demonstrate the gains in practice, we evaluate all the proposed algorithms and compare their performances with the prior approaches through extensive simulations using real and synthesized traffic traces. We hope this work inspires improvements to existing job management and scheduling in distributed computer systems.


  • thumnail for Shafiee_columbia_0054D_16367.pdf Shafiee_columbia_0054D_16367.pdf application/pdf 1.91 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Ghaderi Dehkordi, Javad
Ph.D., Columbia University
Published Here
February 1, 2021