# Workshop "Heavy Tails"

## Dec 9 - Dec 13

#### Summary

The goal of the workshop is to bring together researchers from probability, statistics, and various application areas such as computer science, operations research, physics, engineering and finance and learn from each other on the latest developments in theory [covering both stochastic processes and spatial models], statistical and simulation algorithms, and applications.

#### Sponsors

#### Organizers

Remco van der Hofstad | TU Eindhoven |

Adam Wierman | Caltech |

Bert Zwart | CWI / TU Eindhoven |

#### Speakers

August Balkema | University of Amsterdam |

Bojan Basrak | University of Zagreb |

Mihail Bazhba | CWI |

Ayan Bhattacharya | Wroclaw University |

Alessandra Cipriani | TU Delft |

Aaron Clauset | University of Colorado |

Denis Denisov | University of Manchester |

Jules Depersin | École Nationale de la Statistique et de l'Administration Économique |

Claudia Klüppelberg | TU München |

Anja Janssen | KTH |

Julia Komjathy | TU Eindhoven |

Dmitri Krioukov | Northeastern University |

Daniel Lacker | Columbia University |

Marie-Colette van Lieshout | CWI Amsterdam |

Nelly Litvak | University of Twente |

Thomas Mikosch | University of Copenhagen |

Sid Resnick | Cornell University |

Chang-Han Rhee | Northwestern University |

Andrew Richards | National Grid ESO |

Gennady Samorodnitsky | Cornell University |

Johan Segers | UCLouvain |

Fiona Sloothaak | TU Eindhoven |

Clara Stegehuis | University of Twente |

Casper de Vries | Erasmus University Rotterdam |

Nassim Taleb | New York University |

Olivier Wintenberger | Sorbonne Université |

#### Programme

(Preliminary, a view changes may still be made)

#### Abstracts

**Linear Regression for Heavy Tails**

Linear regression for heavy tails and robust linear regression are related. There is an important difference though. In robust regression outliers are treated as contamination; in linear regression for heavy tails they are sample points invaluable for obtaining accurate estimates of the slope of the regression line.

In the basic model the explanatory variable has a Pareto distribution and the errors a Student distribution. One may write a Pareto variable as an inverse power of a standard uniform variable. The tail index of the explanatory variable is the absolute value of the exponent. The tail index of the error is the inverse of the degrees of freedom of the Student distribution. Expectations are infinite for tail indices one or larger.

The aim is to compare the performance of a number of simple estimators. One such estimator is RMP, the line through the RightMost Point of the sample for which the number of sample points above the line and below is equal. It is not a very good estimator if the tail index of the error is large. There is a simple relation with the estimator LAD, the line of Least Absolute Deviation. That explains why the robust estimator LAD performs poorly if both tail indices are large.

(joint work with Paul Embrechts)

Reference: Guus Balkema and Paul Embrechts (2018) Linear regression for heavy tails. Risks 6(3), 93. https://doi.org/10.3390/risks6030093.

**On the power law tails in stationary branching models
**Using the example of stationary subcritical branching processes with immigration, we discuss how power law tails arise in two entirely different ways. It is not entirely surprising that the regular variation of the immigrant distribution induces the same property for the stationary distribution. In this talk, we explain how the assumption that the branching takes place in a random environment again generates power law tail behavior of the stationary distribution. We analyze probabilistic properties of the resulting stationary process in both cases.

The original motivation comes from Kesten, Kozlov, and Spitzer seminal 1975 paper, which connects a random walk in random environment model to a special Galton–Watson process with immigration in random environment. The talk is based on the joint work with P.Kevei (University of Szeged).

**Sample-path large deviations for heavy-tailed processes and applications**

In this talk, we present our sample path large deviation results for Lévy processes and random walks with heavy-tailed Weibull increments. Our approach hinges on a suitable decomposition of the Lévy process and a proper representation of the random walk. Our results, for the two processes, consist of an extended LDP in the Skorohod space w.r.t. the *J*_{1} topology and an LDP w.r.t. the *M'*_{1 }topology. Applications on queueing theory and large deviations theory for additive functionals of Markov random walks show the novelty of our results. Specifically, we answer the long-standing open problem posed by Sergey Foss regarding the queue length asymptotics of many server queues with heavy-tailed Weibull service times. Lastly, we present our sample-path LDP for unbounded additive functionals of the reflected random walk with light-tailed increments. The LDP holds in the Skorohod space equipped with the *M'*_{1, }and with a sub-linear speed.

(joint work with Jose Blanchet, Chang-Han Rhee, and Bert Zwart)

**PLFit estimation procedure and its consistency
**In Clauset et. al. (2009), PLFIT estimation procedure has been proposed for the power-law index and became popular immediately for its versatile applicability and better performance. This estimation procedure has been used in many areas including scale-free networks, energy networks, preferential attachment model, teletrafic data etc. But the theoretical support for this estimation procedure is still lacking. In this talk, consistency of PLFIT procedure will be addressed under semiparametric assumption.

(ongoing joint work with Bohan Chen, Remco van der Hofstad and Bert Zwart)

**Dynamical fitness models: evidence of universality classes for preferential attachment
**In this talk we study different variations of a particular class of random graphs called preferential attachment models with fitness. These are dynamic graphs in which, at every time step, a new node attaches itself to an older one with probability proportional to the degree and a random factor associated to the node, called the fitness.

Motivated by learning mechanisms of real-life networks, we assume the fitness to be a Moving Average process MA(q) with positive increments. We study different properties of such models, for example the degree distribution, the attachment probability and the phenomenon of Bose-Einstein condensation. Finally we provide evidence that, tuning the parameters of the fitness process, we fall into two universality classes represented by the well-known Albert-Barabási model and Bianconi-Barabási model. This implies the robustness of heavy tails in the degree distribution under random perturbations of the attachment rule.

(joint work with Andrea Fontanari (TU Delft))

**Scale-free networks are rare
**A central claim in modern network science is that real-world networks are typically "scale free," meaning that the degrees of their nodes follow a power-law distribution. Over the past 20 years, a large literature has explored the consequences of this pattern on network vulnerability, controllability, synchronization, and the dynamics of social or biological spreading processes. However, empirical evidence for the ubiquity of scale-free networks is derived from a relatively small number of real-world examples.

In this talk, I will first explain the historical excitement about scale-free networks and why their prevalence remains a controversial topic. Then, I will describe (i) a set of statistical evaluations for assessing the relative degree of evidence for scale-free structure in a network, and (ii) the results of applying these tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational domains. Across domains, we find that scale-free networks are rare: only 4% exhibit the strongest-possible evidence, while 52% exhibit the weakest-possible evidence. Furthermore, we find that social networks are at best weakly scale free, while a handful of technological and biological networks may be strongly scale free. These results indicate that networks are far more structurally diverse than previously appreciated, and highlight productive directions for future model development.

(joint work with Anna D. Broido)

**Markov chains with Asymptotically Zero Drift
**We will consider a Markov chains on the real line with the drift decaying to zero.The two main cases are distinguished, either the drift of the chain decreases as 1/x or much slower than that, say as c/x^a for some a\in(0,1).

First I will discuss renewal theorems for transient Markov chains of this type. Then I will use these renewal theorems to study the local asymptotic behaviour for the tail probability of the invariant measure of ergodic Markov chains with decaying drift. I will show that the invariant measure is heavy-tailed.

The talk is based on joint works with D.Korshunov (Lancaster) and V.Wachtel (Augsburg):

1) Renewal Theory for Transient Markov Chains with Asymptotically Zero Drift, arXiv:1907.07940

2) At the Edge of Criticality: Markov Chains with Asymptotically Zero Drift, arXiv:1612.01592

**Robust and fast estimation of a mean vector for heavy-tailed distributions
**When it comes to estimating the mean of an heavy-tailed distribution with high-probability, the empirical mean does not give satisfying results. This issue has been dealt with using tools such as Median-of-mean (MOM) estimators. Such estimators are very simple to compute and give optimal rates of convergence when the dimension of the random variable is small, but fail to do so in high-dimensional set-ups. The first computationally tractable estimators with optimal rates have been proposed very recently, using both MOM principles and SDP-relaxation tools.

Our paper presents such an estimator, that has, to our knowledge, the best running time for an estimator with optimal rate.

** **

**A k-means clustering procedure for extremes
**Dimension reduction has become an important topic in statistics and has more recently also been applied in the context of extreme value theory.

In this talk, we start by giving an overview over some approaches which have been pursued in this context so far and continue with discussing how the standard assumption of multivariate regular variation can be used to construct simple and efficient ways to model and describe dependency structures of multivariate extremes. In particular, we introduce a k-means clustering procedure on the empirical spectral measure that allows for a comprehensive description of "extremal prototypes". We illustrate our method with several data examples.

(joint work with Phyllis Wan from Erasmus University Rotterdam)

**Estimating a graphical extreme value model
**Recursive max-linear vectors model causal dependence between its components by expressing each node variable as a max-linear function of its parental nodes in a directed acyclic graph and some exogenous innovation. Motivated by extreme value theory, innovations are assumed to have regularly varying distribution tails. We propose a scaling technique in order to determine a causal order of the node variables. All dependence parameters are then estimated from the estimated scalings. Furthermore, we prove asymptotic normality of the estimated scalings and dependence parameters based on asymptotic normality of the empirical spectral measure. Finally, we apply our structure learning and estimation algorithm to financial data and food dietary interview data.

(joint work with Mario Krali)

**Power Loss with Power Laws
**One common task in network/data science is to make reliable inferences from data, which is always finite. Perhaps the simplest example: Given a real-world network adjacency matrix, is the network sparse or dense? It appears to be not widely recognized in network science that the first question cannot have any rigorous answer. It is not surprising then that the question of whether a given network is power-law or not, has not been rigorously addressed at all, even though this question is so foundational in the history of network science.

We review the state of the art in extreme value statistics where power laws are understood as regularly varying distributions that properly formalize the idea in network science that "power laws are straight lines in the loglog scale". There exists a multitude of power-law exponent estimators whose consistent behavior in application to any regularly varying data had been proven long before network science was born. In application to real-world networks these estimators tell us what we already know -- that many of these networks are scale-free. Yet applied to any data these estimators always report some estimates, and the nature of the infinite-dimensional space of regularly varying distributions is such that such estimates cannot be translated to any rigorous guarantees or hypothesis testing methodologies that would be able to tell whether the data comes from a regularly varying distribution or not. This situation is conceptually no different from the impossibility to tell whether a given finite data set is sparse or dense, or whether it comes from a finite- or infinite-variance distribution, or whether it shows that the system has a phase transition. All these questions can be rigorously answered only in the infinite data size limit, never achieved in reality. An interesting big open problem in data science is how and why we tend to make correct inferences about finite data using tools and concepts that are known to work properly only at infinity and whose convergence speed is unknown.

**Extensions of Sanov's theorem via convex risk measures**

This talk is about an extension of Sanov's theorem, in Laplace principle form, based on alternatives to the classical convex dual pair of relative entropy and cumulant generating functional. The main abstract result gives rise to a number of probabilistic limit theorems and asymptotics. For instance, widely applicable non-exponential large deviation upper bounds are derived for empirical distributions and averages of i.i.d. samples under minimal integrability assumptions, notably accommodating heavy-tailed distributions. Other interesting manifestations of the abstract framework include new results on the rate of convergence of empirical measures in Wasserstein distance, uniform large deviation bounds, and error estimates for approximate solutions of stochastic optimization problems. The proofs build on the Dupuis-Ellis weak convergence approach to large deviations as well as the duality theory for convex risk measures.

**Nearest-neighbour Markov point processes on graphs with Euclidean edges
**We define nearest-neighbour point processes on graphs with Euclidean edges and linear networks. They can be seen as analogues of renewal processes on the real line. We show that the Delaunay neighbourhood relation on a tree satisfies the Baddeley–Møller consistency conditions and provide a characterisation of Markov functions with respect to this relation. We show that a modified relation defined in terms of the local geometry of the graph satisfies the consistency conditions for all graphs with Euclidean edges that do not contain triangles.

**The power law hypothesis for PageRank
**PageRank algorithm was originally designed by Google for ranking pages in the World-Wide Web. Mathematically, PageRank is a stationary distribution of a simple random walk with restart on a directed graph. The `power law hypothesis' states that in networks with power law in-degree distribution, PageRank, too, has a power law distribution with the same exponent as in-degree. This is intuitively plausible and has been confirmed in a vast number of experiments from 2002 to the date. However, we still cannot prove this mathematically in full generality. This talk will give an overview of formal results on the PageRank distribution in large power law networks. In particular, we will discuss the most recent approach that exploits the notion of local weak convergence of sparse graph sequences, and builds on the right balance between local and global properties of PageRank.

(joint work with Mariana Olvera-Cravioto, Yana Volkovich, Ninguyan Chen, Remco van der Hofstad, Alessandro Garavaglia)

**Extreme value theory for the entries of a sample covariance matrix **

We derive the point process convergence of the off-diagonal entries of a large sample covariance matrix based on iid data toward a Poisson process. We show how the dimension *p* = *p _{n}* → ∞ as

*n*→ ∞ and the tail behavior of the entries are linked. These entries constitute dependent random walks and we are interested in their joint tail behavior. Our main tools for proving these results are precise large deviation results for sums of independent random vectors.

(joint work with Johannes Heiny (Bochum) and Jorge Yslas (Copenhagen))

**Trimming a Lévy Subordinator
**Link to abstract

**Understanding rare catastrophic events via large deviations theory for heavy tails
**The large deviations theory has been extremely successful in providing systematic tools for understanding rare events that arise in stochastic systems. However, the classical large deviations theory often falls short when the underlying uncertainties are heavy-tailed. The challenge is due to the structural difference in the way system-wide rare events arise when the underlying uncertainties are heavy-tailed. Roughly speaking, in light-tailed settings, the system-wide rare events arise because everything goes wrong a little bit (conspiracy principle), whereas, in heavy-tailed settings, the system-wide rare events arise because only a few things go wrong, but they go terribly wrong (catastrophe principle).

In this talk, I will discuss recent progress in the theory of large deviations for heavy-tailed systems. The new large deviations results can deal with a very general class of rare events associated with heavy-tailed stochastic processes. In particular, I will provide a comprehensive characterization of the catastrophe principle. I will illustrate the versatility of our approach with examples that arise in the contexts of mathematical finance, actuarial science, and queueing theory. Building on the sharp asymptotics provided by our new limit theorems, we will also show how to construct simple, universal, and provably efficient rare-event simulation algorithms for heavy-tailed rare events.

(joint work with Mihail Bazhba, Jose Blanchet, Bohan Chen, Zhe Su, Xingyu Wang, and Bert Zwart)

**Power Disruption on GB electricity network: A case study in a rare, high impact event
**We consider the power outages on 9 August 2019 on the electricity network in Great Britain. We examine the events that led to the widespread power outages. Two power stations ceased operating within one second of each other. Such an event, if the outages were independent, would have extremely low probability. We consider in what way these two events could be considered not independent, and whether a probability can realistically be ascribed to the near simultaneous failure of the two stations. We examine the follow-on consequences of the outages, and how this resulted in many customers having their power supply disrupted. In addition, we consider some of the more unpredictable consequences, which could be characterised as Black Swan events.

**Risk forecasting in the context of time series
**We propose an approach for forecasting risk contained in future observations in a time series. We take into account both the shape parameter and the extremal index of the data. This significantly improves the quality of risk forecasting over methods that are designed for i.i.d. observations and over the return level approach.

We prove functional joint asymptotic normality of the common estimators of the shape parameter and and extremal index estimators, based on which statistical properties of the proposed forecasting procedure can be analyzed.

(joint work with Xiaoyang Lu)

**One- versus multi-component regular variation**

One-component regular variation refers to the weak convergence of a properly rescaled random vector conditionally on the event that a single given variable exceeds a high threshold. Although the weak limit depends on the variable concerned by the conditioning event, the various limits are connected through an identity that resembles the time-change formula for regularly varying stationary time series. The formula is most easily understood through a single multi-component regular variation property concerning some (but not necessarily all) variables simultaneously.

The theory is illustrated for max-linear models, in particular recursive max-linear models on acyclic graphs, and for Markov trees. In the latter case, the one-component limiting distributions take the form of a collection of coupled multiplicative random walks generated by independent increments indexed on the edges of the tree. Changing the conditioning variable then amounts to changing the directions of certain edges and transforming their increment distributions in a specific way.

Reference:

Segers, J. (2019). "One- versus multi-component regular variation and extremes of Markov trees", https://arxiv.org/abs/1902.02226.

**Emergence of scale-free blackout sizes in power grids**

Blackouts in power grids are among the most disruptive events in our society. Due to the intricate interaction of various physical and man-made features, the propagation of cascading failures are hard to predict on a microscopic level. Nevertheless, there is a macroscopic feature of blackouts that is easy to describe: US data suggest that blackout sizes are Pareto-tailed. The origin of this law is attributed to self-organized criticality in the engineering and physics literature. We offer arguments in support of a completely different hypothesis, namely that the Pareto-tailed blackout sizes are a consequence of the scale-free nature of city population sizes. We validate our claim through data, a mathematical framework and a case study.

(joint work with Tommaso Nesti and Bert Zwart)

**Optimal structures in power-law networks Subgraphs contain important information about network structures and their functions**

We investigate the presence of subgraphs in several random graph models with infinite-variance degrees. We introduce an optimization problem that identifies the dominant structure of any given subgraph. The optimizer describes the degrees of the vertices that together span the most likely subgraph and allows us count and characterize all subgraphs.

We then show that this optimization problem easily extends to other network structures, such as local clustering, which expresses the probability that two neighbors of a vertex are connected. The optimization problem is able to find the behavior of clustering in a wide class of random graph models.

**On the Statistical Differences between Binary Forecasts and Real World Payoffs
**What do binary (or probabilistic) forecasting abilities have to do with overall performance? We map the difference between (univariate) binary predictions, bets and "beliefs" (expressed as a specific "event" will happen/will not happen) and real-world continuous payoffs (numerical benefits or harm from an event) and show the effect of their conflation and mischaracterization in the decision science literature. We also examine the differences under thin and fat tails.

See also https://arxiv.org/pdf/1907.11162.pdf

**Shape or Scale in Financial Returns?**

Stock markets exhibit recurrent sharp downward movements. This tail risk is well known to be non-normally distributed and is rather characterized by a hyperbolic function comparable to the shape of the Pareto distribution. This hyperbolic function is to a first order described by a power and a scale parameter. The power is generally known as the tail index. In the literature there is considerable discussion concerning the question whether the shape as measured by the tail index is the same for different stocks. The discussion almost exclusively focusses on possible differences in the tail index, with little attention for the scale. The scale and the differences in scale, however, may also play an important role in portfolio risk. In this paper we investigate the issue by testing for equality of both the power and the scale parameters. As we show, this requires first testing for equality of tail indices. If these are unequal, the stock with the lowest tail index dominates the tail risk. If equal, however, the difference in tail risks stems from the unequal scale parameters. As we show, both tests are marred by a bias stemming from second order parameters. Subsequently we develop tests that correct for this bias. The tests also take into account that the securities are not independent, in line with standard finance theory. Lastly, we apply the tests to a sample of stock returns and infer the economic implications.

**Asymptotic Independence ex machina -- Extreme Value Theory for the Diagonal BEKK-ARCH(1) Model**

We consider multivariate stationary processes satisfying a stochastic recurrence equation with a multiplicative factor consisting on a random factor times a deterministic diagonal matrix. We obtain a full characterization of the multivariate regular variation properties of the solution, proving that the coordinates are asymptotically independent if and only if the corresponding diagonal elements are distinct; even though all coordinates rely on the same random input. We describe extremal properties of in the framework of vector scaling regularly varying Markov chains. Our results are applied to multivariate autoregressive conditional heteroskedasticity (ARCH) processes.

(joint work with Sebastian Mentemeier)

#### Registration

Please use this **link** to register

#### Practical information

For participants looking for accommodation, the Student Hotel Eindhoven offers you 10% discount on the best available price (only for the nights 8-14 December). The code can be retrieved by emailing Patty Koorn

Link to more practical information

More information to follow soon!!