- This event has passed.

# Workshop YEQT XV "Machine Learning for Stochastic Networks"

## Nov 2 - Nov 4

#### Summary

The *Young European Queueing Theorists (YEQT)* workshop is organized annually and aims to bring together young researchers and world-leading experts. State-of-the-art research related to queueing theory, operations research, applied probability and related areas are discussed. The event provides an excellent opportunity for developing researchers to interact and exchange ideas in an informal, friendly, yet research-focused setting. The program typically consists of presentations by both senior and junior researchers, and several keynote presentations and tutorials by prominent researchers.

#### Theme and aim

This year’s theme for the YEQT workshop is *Machine Learning for Stochastic Networks*. The aim of YEQT XV is to bring together senior researchers who have made substantial contributions to advancing machine learning techniques and the optimization of stochastic networks. The methodological focus will be on research that combines theoretical stochastic modeling and optimization together with modern machine learning techniques.

To see the importance, consider for example that resource management in stochastic networks, such as occur in data centers and cloud networks, involves huge challenges due to increasingly heterogeneous applications with highly diverse traffic characteristics and performance requirements. Machine learning techniques and stochastic learning approaches offer tremendous potential to e.g. steer dynamic resource management and drive data replication in a self-optimizing manner.

#### Keynote speakers

Ton Dieker | Columbia University |

Tor Lattimore | Deepmind |

Laurent Massoulié | Inria |

Alexandre Proutiere | KTH Royal Institute of Technology |

#### Invited speakers

Tim van Erven | University of Amsterdam |

Benjamin Fehrman | University of Oxford |

Wouter Koolen | CWI |

Christina Lee Yu | Cornell University |

Frans Oliehoek | Delft University of Technology |

Mihalis Markakis | IESE Business School |

Shie Mannor | Technion |

Arnoud den Boer | University of Amsterdam |

Willem van Jaarsveld | Eindhoven University of Technology |

Louis-Sébastien Rebuffi | Inria |

Albert Senen-Cerda | Eindhoven University of Technology |

#### Programme

The schedule can be found **here.**

#### Abstracts

**Ton Dieker
QPLEX: A Modeling and Computational Methodology for Transient Analysis of a Class of Stochastic Systems
**This work was motivated by problems of managing stochastic networks featuring temporary overloads, small to moderate time-varying number of servers, non-homogeneous service-time distributions and complex routing of entities. Such problems require prediction of future system behavior over time so that appropriate adjustments can be made today.

In the talks, we present a methodology to predict transient distributions of key performance metrics over time, e.g., distributions of the number of customers or the virtual wait times in a queueing network. Our methodology is fundamentally non-parametric and non-Markovian. It includes a modeling framework capable of representing the characteristics described above, and a computational approach that quickly generates results by relying on approximate deterministic calculations.

The primitives of our modeling framework are relatively few. In simple models, they might be scalars or probability mass functions. In more elaborate models, they can be specified as code fragments. The resulting model defines the dynamics from which the desired performance metrics can be computed exactly, but this is not practical due to the sheer size of the number of calculations required. Our computational approach introduces a vastly lower dimensional space where a much smaller number of calculations suffice. Experiments on a diverse and challenging testbed yield remarkably accurate results and the time to generate thousands of distributions is on the order of seconds. For several classical queueing systems our computational approach generates exact results.

Throughout the talks, we emphasize connections with machine learning methodologies such as probabilistic graphical models, probabilistic programming, and data-driven Bayesian model selection. We also illustrate our modeling framework and computational approach with many stochastic network examples. We conclude with a brief discussion of future directions.

Joint work with Steve Hackman (Georgia Tech).

**Tor Lattimore
**

**Information-Theoretic Sequential Decision-Making**

I will talk about the information-theoretic machinery for analysing sequential decision-making problems developed by Russo and Van Roy (2014) and its many generalisations and connections to other algorithmic techniques. Along the way I'll talk about applications to multi-armed bandits and convex optimisation.

**Laurent Massoulié**

**1) Alignment of random graphs: informational and computational limits**

Graph alignment is a generic unsupervised machine learning objective with many applications, including de-anonymization of social network data. In this talk we shall consider partial alignment of correlated Erdös-Rényi random graphs. We shall describe recent results on information-theoretic limits to feasibility of alignment. We shall also describe polynomial-time algorithms for such alignment and sufficient conditions under which they succeed. The discrepancy between the informational and computational feasibility conditions suggests that the alignment problem displays a so-called “hard phase”, an intriguing phenomenon that has been observed in several other high-dimensional statistical learning tasks.

**2) Fast Distributed Optimization with Asynchrony and Time Delays**

The training of models over distributed data calls for distributed optimization schemes. This has motivated research on distributed convex optimization, leading to the identification of lower bounds on convergence speed, and distributed optimization algorithms with convergence speed potentially matching these lower bounds, for a variety of settings. In this talk we focus on the important setting of asynchronous operation, for which we propose optimization algorithms with optimal speed. We next consider systems with heterogeneous communication and computation delays, for which we propose fast asynchronous algorithms adapted to these heterogeneous delays.

**Alexandre Proutiere**

**Near-Optimal Reinforcement Learning with Block and Linear Representations**

In Reinforcement Learning, leveraging succinct representations of the system state and its dynamics and of the reward functions is empirically known to considerably accelerate the search for efficient policies. The design of RL algorithms with provable performance guarantees and that learn and exploit such representations remains however largely open. In these talks, we address this challenge for the so-called Block and linear MDPs.

In Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are interested in (i) estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under some behavior policy, and (ii) learning a near-optimal policy based on our estimated block structure. For both problems, we derive fundamental limits satisfied by any learning algorithm, and devise algorithms whose performance approaches these limits. We quantify the gains achieved by learning and exploiting the structure.

Next we consider the problem of best policy identification in discounted Linear MDPs in the fixed confidence setting, under both generative and forward models. We derive an instance-specific lower bound on the expected number of samples required to identify an $\epsilon$-optimal policy with probability $1-\delta$. The lower bound characterizes the optimal sampling rule as the solution of an intricate non-convex optimization program, but can be used as the starting point to devise simple and near-optimal sampling rules and algorithms. We devise such algorithms, and analyze their sample complexity.

This is joint work with Yassir Jedra, Junghyun Lee, Jerome Taupin, and Seyoung Yun

**Benjamin Fehrman
A local convergence analysis for stochastic gradient descent
**The loss landscape in machine learning can exhibit a wide variety of phenomenon, including local but not global minimizers, saddle points, and large regions of degeneracies due, for example, to the vanishing of the gradient of the activation function. In analyzing the convergence properties of stochastic gradient descent, such behavior precludes methods based on assumptions like global convexity or contractivity, or such as a global Polyak--Łojasiewicz inequality. In this talk, we will first provide an introduction to stochastic gradient descent in machine learning, and explain some of these issues through explicit examples. We will then put forth a local analysis of stochastic gradient descent algorithms that avoids any global assumptions on the loss landscape. The analysis obtains quantitative probabilistic estimates for the convergence and is based on using mini-batch approximations of the stochastic gradient in order to control the loss of iterates to non-contracting regions.

**Tim van Erven**

**Distributed Online Convex Optimization**

In the first half of the talk, I will give an introduction to online convex optimization (OCO), which is a very general framework for sequential prediction tasks. Then, in the second half, I will present recent results on distributed online convex optimization in a setting in which multiple agents are cooperating across a network to jointly solve a single OCO task.

This is joint work with Dirk van der Hoeven and Hédi Hadiji.

**Wouter Koolen**

**Instance Optimal Pure Exploration in Bandit models***
*Pure Exploration is the art of using controlled experiments to guide decision making. It is studied in several areas, including A/B testing, Best Arm Identification and active sequential hypothesis testing. In this talk we will study how the decision at hand should influence our testing procedure. To understand the interplay, we will precisely quantify the complexity of testing problem instances, and investigate their ideal resolution. This will lead us to the notions of characteristic time and oracle allocations. Building on these concepts we will then describe a path to practical instance optimal algorithms for pure exploration.

**Christina Lee Yu**

**Adaptive Discretization in Online Reinforcement Learning**

Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic solutions to these questions, there has been little theoretical treatment. In this paper we provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning, providing model-free and model-based algorithms. We show how our algorithms are able to take advantage of inherent structure of the problem by providing guarantees that scale with respect to the 'zooming dimension' instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal Q* function. Many applications in computing systems and operations research requires algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. This motivates its use in practical applications as our approach automatically adapts to underlying problem structure even when very little is known a priori about the system.

**Frans Oliehoek
Sequential decision making from the perspective of influence-based abstraction
**Reinforcement learning (RL) and more generally sequential decision making deal with problems where the decision maker ('agent') needs to take actions over time. While impressive results have been achieved on challenging domains like Atari, Go, and Starcraft, most of this work relies on neural networks to form their own internal abstractions. However, in many cases, we may be able to exploit some knowledge about the domains to guide this process.

Therefore, in this talk, I will present a more analytical approach towards abstraction. Specifically, I will cover some of the results of the ERC starting grant project 'INFLUENCE', which has further developed the framework of 'influence-based abstraction'. The idea is to describe a complex system from the perspective of a local decision problem. For instance, when trying to optimize the traffic flow at an intersection in a large city, it makes intuitive sense that we might be able to perform some local optimization, as long as we take into account how this problem is influenced over time. In my talk, I will give a formal definition of such 'influence' and discuss different ways in which we have leveraged this perspective in recent years.

**Mihalis Markakis**

**Bidding to join the queue: mechanisms, efficiency, and learning
**We investigate the design of fair and efficient bidding mechanisms for observable queues with customers that have private time valuations. Our primary motivation comes from merging in algorithmic traffic, i.e., a driver wishing to merge in a relatively dense platoon of vehicles in a coordinated and efficient way, using inter-vehicle communication and micro-payments, akin to an arriving customer trading for position in a single-server observable queue. We analyze the performance of a mechanism where the arriving customer makes sequential, take-it-or-leave-it bids from tail to head of a platoon (T2H), with the condition that the vehicle can advance to the next position only if it wins the bid. This mechanism is easily implementable, balances the budget, and imposes no negative externalities. We compare it to head-to-tail bidding (H2T), which favors the merging driver but potentially causes uncompensated externalities. Assuming i.i.d. time valuations, we obtain the optimal bids, value functions, and expected social welfare in closed form in both mechanisms. Moreover, if the time valuation of the merging driver is not high, we show that the expected social welfare of T2H is close to a partial-information social optimum, and that the expected social welfare of H2T is lower than that of T2H as long as the platoon is not too short. Our numerical experiments suggest that mechanisms based on sequential, take-it-or-leave-it bids from tail to head of an observable queue have good social welfare performance, even if the corresponding bids are not chosen optimally, as long as the time valuation of the arriving customer is not high. We extend our modeling framework to include learning some ex ante unknown parameters of the time-valuation distribution over different episodes, by combining the proposed T2H bidding with Posterior Sampling. Empirically, we establish that the unknown parameters converge very fast to their true values, resulting in excellent regret-performance for the joint scheme.

Joint work with Dmitrii Tikhonenko, Kalyan Talluri, Claudio Sánchez, and Gergely Neu.

**Arnoud den Boer
Cartel formation by data-driven price algorithms
**Can price algorithms learn to form a cartel instead of compete against each other, potentially leading to higher consumer prices and lower social welfare? The question is controversial among economists and competition policy regulators. One the one hand, concerns have been expressed that self-learning price algorithms do not only make it easier to form price cartels, but also that this can be achieved within the boundaries of current antitrust legislation – raising the question whether the existing competition law needs to be adjusted to mitigate undesired algorithmic collusion. On the other hand, a number of economists believe that algorithms learning to collude is science fiction, except by using forms of signaling or communication that are already illegal, and argue that there is no need to change antitrust laws. Motivated by this discussion, I will present recent work on learning supra-competitive prices in price and assortment games.

Based on joint work with Janusz Meylahn, Thomas Loots, and Ali Aouad.

**Willem van Jaarsveld
Algorithms for Operational Decision Making in Logistic Networks
**Companies are increasingly investing in real-time tracking of logistics resources, assets and products. To monetize this investment, they must learn to optimize their day-to-day decisions based on the latest available information, which requires new algorithms. We discuss how Deep Reinforcement Learning may underlie such algorithms.

**Louis-Sébastien Rebuffi**

**Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
**In this talk, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the diameter D of the MDP is Ω(S S), where S is the number of states. Therefore, the existing lower and upper bounds on the regret at time T , of order O(√ DSAT) for MDPs with S states and A actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm UCRL2 is in fact upper bounded by Õ(√ E 2 AT) where E 2 is related to the weighted second moment of the stationary measure of a reference policy. Importantly, E 2 is bounded independently of S. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform.

**Albert Senen-Cerda
Asymptotic convergence rate of dropout on shallow linear neural networks
**We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear neural networks---which can also be viewed as doing matrix factorization using a particular regularization. Dropout algorithms such as these are regularization techniques that use {0,1}-valued random variables to filter weights during training of neural networks in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the neural network. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.

#### Organization

YEQT XV is being organized by Jaron Sanders (TU/e), Peter Verleijsdonk (TU/e), and Kuang Xu (Stanford).

#### Registration

**Registration for the workshop is now closed.**