- This event has passed.
YEQT XIV: "Load balancing and scheduling for distributed service systems"
Jun 7 - Jun 9
Distributed service systems are ubiquitous, from server farms for cloud computing and charging stations for electric vehicles, to checkout lines at supermarkets and ICU beds in hospitals. The size of many of these systems has exploded over the past few years, which has created new challenges in the design of control policies. These new challenges, such as scalability, data locality, and server and task heterogeneity, have fueled a renewed interest in the study of these systems, and accelerated the introduction of more recent theoretical tools that opened new and exciting avenues of research around this topic. This workshop will focus on the application of new approaches and techniques from queueing theory, operations research, and machine learning to the control, design, and modeling of such distributed service systems.
The 14th edition of the YEQT workshop, which will take place in the spring of 2021, aims to bring together young researchers (PhD students, postdocs, and recently appointed lecturers or assistant professors) and renowned scientists to share ideas, discuss research, and get an overview of the current trends in queueing theory and operations research. Following the trend initiated by the previous edition of the workshop, we would also like to make a bridge with disparate disciplines, such as machine learning and data science, to encourage more interdisciplinary work and foster novel ideas in load balancing and scheduling for distributed service systems. The workshop program will consist of presentations by young researchers and keynote presentations by prominent scientists.
We have had to make the decision to make this workshop an online event. Covid measures do not allow us to come together in Eindhoven. We will be using Zoom for the meeting.
|Celine Comte||TU Eindhoven|
|Youri Raaijmakers||TU Eindhoven|
|Martin Zubeldia||TU Eindhoven / UvA|
|Urtzi Ayesta||CNRS and University of Basque Country|
|Siva Theja Maguluri||Georgia Institute of Technology|
|Devavrat Shah||Massachusetts Institute of Technology|
|Emina Soljanin||Rutgers University|
|R. Srikant||University of Illinois at Urbana-Champaign|
|Alexander Stolyar||University of Illinois at Urbana-Champaign|
|Ellen Cardinaels||Eindhoven University of Technology|
|Josu Doncel||University of Basque Country|
|Kristy Gardner||Amherst College|
|Isaac Grosof||Carnegie Mellon University|
|Lucas van Kreveld||University of Amsterdam|
|Debankur Mukherjee||Georgia Institute of Technology|
|Weina Wang||Carnegie Mellon University|
|Qiaomin Xie||Cornell University|
|Xingyu Zhou||Wayne State University|
*** The social event will be open for the first 70 registered participants. So, if you want to join the fun event, be sure to register soon!!
Recent results on redundancy: product-form distributions and stability conditions
Redundancy (aka replication) has received lot of attention recently in the context of load balancing problems. With redundancy several copies of the same job are dispatched to various servers. We will show that under appropriate assumptions the steady-state distribution is of product-form, which permits the characterization of performance metrics like mean delays and stability conditions. We will then discuss the stability conditions as a function of the modeling assumptions (service time distributions, correlation among copies, scheduling discipline etc.) We will conclude by discussing open questions and future research directions.
Siva Theja Maguluri
A Heavy-Traffic Theory of Two-Sided Queues with Applications in Matching Platforms and Online Marketplaces
Emerging applications in online matching platforms and marketplaces motivate the study of two-sided queues, where customers and servers depart as soon as they are matched. In contrast to rich theory of classical single sided queues, there is very little understanding of two-sided queues, especially in steady-state. Starting with an introduction to a two-sided queue, the talk presents fundamental stability issues that need to be overcome to ensure existance of steady-state. The notion of heavy-trraffic will be introduced, and asymptotic behavior in a single two-sided queue will be characterized. The key take-away is that the limiting distribution exhibits a phase transition between a Laplace distribution and a uniform distribution.
As an application of this theory, the problem of pricing and matching in an online marketplace such as a ride-hailing or meal-delivery system will be considered. Such a system can be modeled as a bipartite graph of two-sided queues. We present an asymptotically optimal pricing and matching algorithm. In contrast to the square root scaling that is ubiquitous in queueing theory, the optimal profit has a cube root scaling in the size of the system. Using the heavy-traffic results, we argue that such a cube root scaling capatures the fundamental trade-off between profit maximization and delay minimization objectives.
A prototypical example of causal inference with observational data pertains to evaluating impact of a policy, such as universal background check for gun purchase, on outcome of interest, such as gun violence, with respect to alternatives such as no background check or stricter form of gun control law. Unlike the setting of clinical trials where randomized control experiments are feasible, such is not feasibility for policy evaluation. To address this, we present a causal framework, synthetic interventions (SI), that extends synthetic control (SC) to the multiple treatment setting. Formally, given N units (e.g. states) and D interventions (e.g. various gun control policies), the aim of SI is to estimate the counterfactual outcome of each unit under each of the D interventions (including control). We showcase the efficacy of the SI framework on several real-world applications, such as running data-efficient A/B tests in e-commerce and correcting for bias in clinical trials due to dropouts. Finally, we show how to produce tight confidence intervals around our causal estimates. The key to our framework is connection between causal inference and tensor estimation.
(joint work with Anish Agarwal and Dennis Shen, both at MIT)
Codes for (Un)expected Loads
Distributed computing systems strive to maximize the number of concurrent data access requests they can support with fixed resources. Replicating data objects according to their relative popularity and access volume helps achieve this goal. However, these quantities are often unpredictable. In emerging applications such as edge computing, not only the instantaneous but also the expected numbers of users and their data interests extensively fluctuate. Therefore, data storage schemes should support such dynamics. Erasure-coding is emerging as an efficient and robust form of redundant storage. This talk asks which data access rates erasure-coded systems can support. It introduces the notion of access service rate region and argues that it should be an essential consideration in designing efficient distributed systems that must remain stable for a wide range and multiple combinations of mean loads. We will explain the recently recognized connections with batch codes for load balancing and combinatorial optimization on graphs. We will discuss some systems issues as well.
The Lyapunov Drift Analysis for Performance Analysis
The Lyapunov drift method is a powerful tool for performance analysis of stochastic networks and other stochastic models. In this talk, we will survey different applications and different methods for choosing the Lyapunov function for getting good performance bounds.
Two Large-scale Particle Systems with Mean-field Interaction
We discuss two different models, which can be viewed as large-scale particle systems with mean-field interaction. In the first model, n particles move forward on the real line by jumps, with the instantaneous rate of a particle jump given by a decreasing function of the particle's location rank among all particles. When n is large, the evolution of the particle locations' distribution is described by a mean-field limit. Our main results here concern the existence and uniqueness of - and attraction to - mean-field limits which are traveling waves.
The second model covers a broad class of parallel server systems with multi-component jobs. (The model is broad enough to include as special cases some popular queueing models with redundancy, such as cancel-on-start and cancel-on-completion redundancy.) There are multiple job classes, and each class may be of one of two types -- least-load or water-filling -- which determine the rule according to which the job components add workloads to the servers. Here we prove the steady-state asymptotic independence of server workloads, as the number n of servers goes to infinity, while the system load remains sub-critical. The analysis, again, uses mean-field limits.
In part, this is a joint work with Seva Shneer (Heriot-Watt University)
Heavy-Traffic Universality of Redundancy Systems with Data Locality Constraints
Heterogeneity and compatibility relations between jobs and servers are becoming ubiquitous in cloud computing platforms due to data locality and network topology constraints. These features strongly diverge from the inherent symmetry of the supermarket model as the baseline scenario for performance benchmarking in parallel-server systems.
In this talk we will specifically focus on redundancy scheduling systems with compatibility constraints to gain insight from product-form distributions via a heavy-traffic limit. The asymptotics reveal a striking universality property, as the system achieves complete resource pooling and exhibits the same behavior across a broad range of scenarios. In particular, the performance of a fully flexible system can asymptotically be matched even with quite stringent compatibility constraints.
(joint work with Sem Borst and Johan van Leeuwaarden)
Size-based Routing Policies to Parallel-Server Systems
The Size Interval Task Assignment (SITA) is an open-loop routing policy in which the service time distribution is divided into intervals and all the jobs whose size fall in a given interval are dispatched to the same server. We characterize the performance degradation when we scale up the number of servers and the arrival rate proportionally and we also study the performance of this system in an asymptotic regime where the system capacity grows linearly with the system demand to infinity. Finally, we present a variant of the SITA policy where the knowledge of service time of jobs is not required and we compare its performance with the performance of SITA.
A General "Power-of-d" Dispatching Framework for Heterogeneous Systems
Intelligent dispatching is crucial to obtaining low response times in large-scale systems. One common scalable dispatching paradigm is the "power-of-d," in which the dispatcher queries d servers at random and assigns the job to a server based only on the state of the queried servers. The bulk of power-of-d policies studied in the literature assume that the system is homogeneous, meaning that all servers have the same speed; meanwhile real-world systems often exhibit server speed heterogeneity.
We introduce a general framework for describing and analyzing heterogeneity-aware power-of-d policies. The key idea behind our framework is that dispatching policies can make use of server speed information at two decision points: when choosing which d servers to query, and when assigning a job to one of those servers. Our framework explicitly separates the dispatching policy into a querying rule and an assignment rule; we consider general families of both rule types.
While the strongest assignment rules incorporate both detailed queue-length information and server speed information, these rules typically are difficult to analyze. We overcome this difficulty by focusing on heterogeneity-aware assignment rules that ignore queue length information beyond idleness status. In this setting, we analyze mean response time and formulate novel optimization problems for the joint optimization of querying and assignment. We build upon our optimized policies to develop heuristic queue length-aware dispatching policies. Our heuristic policies perform well in simulation, relative to policies that have appeared in the literature.
Asymptotically Optimal Multiserver Scheduling: SRPT and Gittins
In stochastic scheduling theory, it is well understood how to minimize mean response time in single-server models, like the M/G/1. If job sizes are known, the Shortest Remaining Processing Time (SRPT) policy is optimal. Its generalization, the Gittins policy, is optimal even when job sizes are not known or if partial size information is known.
Unfortunately, almost nothing was known about any scheduling policies in the M/G/k, prior to this line of work.
In a pair of papers, we give the first stochastic bounds on mean response time of SRPT and Gittins in a multiserver system, the M/G/k. Using our response time bounds, we show that SRPT and Gittins have asymptotically optimal mean response time in the M/G/k, in the limit as load approaches capacity.
While our papers differ in many important ways, they share a common structure. In each paper, the core of our result is the concept of relevant work - the work that must complete before a given job may run. We first express response time in terms of relevant work, then we bound relevant work in the M/G/k relative to relevant work in a resource-pooled M/G/1. To prove these two steps in our two papers, we combine many different techniques in a novel fashion: tagged-job analysis, ideas from worst-case scheduling theory, a new understanding of the Gittins policy, and ideas from Palm calculus.
(joint work with Ziv Scully and Mor Harchol-Balter)
Lucas van Kreveld
When Does the Gittins Policy Have Asymptotically Optimal Response Time Tail?
We consider scheduling in the M/G/1 queue with unknown job sizes. It is known that the Gittins policy minimizes mean response time in this setting. However, the behavior of the tail of response time under Gittins is poorly understood, even in the large-response-time limit. Characterizing Gittins's asymptotic tail behavior is an important problem, because if Gittins has optimal tail asymptotics, then it simultaneously provides optimal mean response time and good tail performance.
In this talk, we discuss new results that give the first comprehensive account of Gittins's asymptotic tail behavior. For heavy-tailed job sizes, we find that Gittins always has asymptotically optimal tail. The story for light-tailed job sizes is less clear-cut: Gittins's tail can be optimal, pessimal, or in between. To remedy this, we propose a modification of Gittins which avoids pessimal tail behavior while achieving near-optimal mean response time.
A New Approach to Capacity Scaling Augmented With Unreliable Machine Learning Predictions
Modern data centers suffer from immense power consumption. The erratic behavior of internet traffic forces data centers to maintain excess capacity in the form of idle servers in case the workload suddenly increases. As an idle server still consumes a significant fraction of the peak energy, data center operators have heavily invested in capacity scaling solutions. In simple terms, these aim to deactivate servers if the demand is low and to activate them again when the workload increases. To do so, an algorithm needs to strike a delicate balance between power consumption, flow-time, and switching costs. Over the last decade, the research community has developed competitive online algorithms with worst-case guarantees. In the presence of historic data patterns, prescription from Machine Learning (ML) predictions typically outperform such competitive algorithms. This, however, comes at the cost of sacrificing the robustness of performance, since unpredictable surges in the workload are not uncommon. The current work builds on the emerging paradigm of augmenting unreliable ML predictions with online algorithms to develop robust algorithms that enjoy the benefits of both worlds.
In this talk, we will present a continuous-time model for capacity scaling, where the goal is to minimize the weighted sum of flow-time, switching cost, and power consumption in an online fashion. We will describe a novel algorithm, called Adaptive Balanced Capacity Scaling (ABCS), that has access to black-box ML predictions, but is completely oblivious to the accuracy of these predictions. In particular, if the predictions turn out to be accurate in hindsight, we prove that ABCS can be (1+ε)-competitive. Moreover, even when the predictions are inaccurate, ABCS guarantees a bounded competitive ratio.(joint work with Daan Rutten)
Zero Queueing for Multi-Server Jobs
Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not ``fit'' into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem.
In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.
Data-Driven Stochastic Network Control via Reinforcement Learning
Finding a good control policy of stochastic queueing networks is crucial for achieving desirable performance (e.g., high throughput or low delay). These stochastic control problems have been recognized to be notoriously difficult. In this talk, we consider a new paradigm of data-driven approaches that learn network control policies using Reinforcement Learning (RL).
Specifically, we study the problem of RL with unbounded state spaces --- a ubiquitous feature of queueing networks. Existing approaches and performance metrics, mostly designed for finite or compact state spaces, are not well suited for this setting. Inspired by the literature in queuing systems and control theory, we propose a notion of stability as a performance metric for RL policies: the state dynamics under the policy should remain in a bounded region with high probability. Building on our prior work on Monte Carlo tree search, we develop an RL policy that provably achieves the stability property whenever the optimal policy respects a Lyapunov function. Moreover, we design an adaptive version of the algorithm, based on carefully constructed statistical tests, which does not require knowledge of the Lyapunov function nor its drift parameter.
Given a stable policy, we further develop a model-based RL method and prove that it converges to the optimal policy. Our method demonstrates promising performance in a variety of network control problems including routing, dynamic server allocation and switch scheduling.
Stein’s Method for Heavy-Traffic Analysis: Load Balancing and Scheduling
In this talk, we apply the framework of Stein’s method (i.e., Braverman et al) to provide a sharp analysis of queueing systems in heavy traffic. The use of Stein’s method enables us to utilize the bounds established by the drift method to directly characterize the stationary distribution in the heavy traffic limit. More importantly, it also provides the convergence rate characterization in terms of the Wasserstein distance. We illustrate the power and elegance of this new analysis through applications in both load balancing and Max-Weight scheduling.
Registration is now closed.
For more information on previous YEQT workshops, please follow this link.