- This event has passed.

# YES III: "Uncertainty Quantification"

## Jan 23, 2018 - Jan 25, 2018

#### Summary

Uncertainty quantification plays a central role in many areas of applied sciences, from statistics to optimization. In the broad context of statistics it is of key importance to understand how much one can trust a statistical procedure by quantifying its error as accurately as possible. This is essential for making meaningful conclusions when using any statistical procedure – the lack of a reliable description of the procedure's uncertainty will invariably lead to misleading and haphazard results with limited practical applicability.

In recent years uncertainty quantification was investigated in many modern, complex statistical problems through both the lenses of theory and practice. In particular, in non-parametric settings it is well known that estimators and tests whose performance adapts to unknown features of the model (e.g., smoothness of a regression function) can be devised. However, in general it is not possible to make the corresponding adaptive confidence statements without making further (often stringent) assumptions. These results are in a sense pessimistic, as they often characterize worst-case (minimax) behavior, and thus have not prevented practitioners from using uncertainty quantification methods, sometimes in inadequate ways. Progress has been made in recent years shedding light on uncertainty quantification in complex scenarios, both in frequentist and Bayesian frameworks. Nevertheless there are still a lot of unanswered questions concerning many real world applications involving for instance network models, machine learning algorithms and financial models.

This workshop aims to introduce this broad field of research to young researchers, including Ph.D. students, postdocs and junior early stage researchers, with a balanced focus on both theory and practice.

#### Organizers

Rui Castro (TU Eindhoven)

Botond Szabó (Leiden University)

#### Format

The workshop will take place at Eurandom, and will consist of tutorial courses given by four world experts in the field. There will be also contributed talks, as well as plenty of time for discussion.

#### Tutorial Speakers

David Blei, Columbia University

**Variational Inference: Foundations and Innovations**

Richard Nickl, Cambridge University

**Confidence regions in high-dimensional and nonparametric statistical models**

Aad van der Vaart, Leiden University

**Nonparametric Bayesian uncertainty quantification**

Cun-Hui Zhang, Rutgers University

**Statistical inference with high-dimensional data**

#### Program

Monday (January 23rd)

**09:20 - 09:50**Coffee and Registration**09:50 - 10:00**Opening Remarks**10:00 - 10:45**Aad van der Vaart**10:55 - 11:40**Cun-Hui Zhang**11:40 - 12:10**Break**12:10 - 12:30**William Weimin Yoo - contributed talk**12:35 - 12:55**Elodie Vernet - contributed talk**13:00 - 14:20**Lunch**14:20 - 15:05**Richard Nickl**15:15 - 16:00**Aad van der Vaart**16:00 - 16:30**Break**16:30 - 16:50**Julyan Arbel - contributed talk**16:55 - 17:15**Federico Camerlenghi - contributed talk**17:20 - 19:00**reception and poster session

Tuesday (January 24th)

**09:15 - 10:00**Richard Nickl**10:10 - 10:55**Cun-Hui Zhang**11:00 - 11:30**Break**11:30 - 11:50**Dennis Dobler - contributed talk**11:55 - 12:15**Charles Gadd - contributed talk**12:20 - 14:00**Lunch**14:00 - 14:45**Richard Nickl**14:55 - 15:40**Aad van der Vaart**15:40 - 16:10**Break**16:10 - 16:55**David Blei**17:05 - 17:25**Nurzhan Nurushev - contributed talk**17:30 - 17:50**Rohit Patra - contributed talk**19:00 - 22:00**conference dinner

Wednesday (January 25th)

**09:15 - 10:00**David Blei**10:10 - 10:55**Cun-Hui Zhang**11:00 - 11:30**Break**11:30 - 12:15**David Blei**12:30 - 14:30**Closing of the workshop and lunch

#### Abstracts

**Tutorial Speakers**

**Nonparametric Bayesian uncertainty quantification**

Aad van der Vaart, Leiden University

In Bayesian nonparametrics a functional parameter (density, regression function) is equipped with a prior distribution, and a posterior distribution is obtained in the standard manner. The center of the posterior distribution can be used as a point estimator. It has been documented in fair generality that reasonable priors give good reconstructions of an unknown function. In particular, priors that come with a bandwidth or sparsity parameter that is tuned to the data by a hierarchical or empirical Bayes method typically lead to posterior distributions that contract at optimal rates to the true function. However, at the core of the Bayesian method is uncertainty quantification through the full spread of the posterior distribution. For instance, one would hope that the area covered by a plot of a sample of draws (functions) from the posterior distribution can be interpreted as a confidence set. The purpose of the three talks is to investigate to what extent this is justified. A full and general answer is currently not available, but we discuss results for special models, which are thought to extend to other models as well. The talks will assume no prior knowledge of Bayesian nonparametrics; we shall start with examples of priors and contraction rates.

**Statistical inference with high-dimensional data**

Cun-Hui Zhang, Rutgers University

We consider a semi low-dimensional approach to statistical inference with highdimensional data. The approach is best described with the following model statement:

model = low-dimensional component + high-dimensional component.

The main objective of this approach is to develop asymptotically efficient statistical inference procedures for the low-dimensional component, such as p-values and confidence regions. Just as in semiparametric inference, a sufficiently accurate estimate of the high-dimensional component is required in order to carry out the inference for the low-dimensional component. The feasibility of estimating the high-dimensional component at the required accuracy depends on the model complexity and ill-posedness, signal strength, the type of low-dimensional inference problem under consideration, and sometimes availability of certain ancillary information. We will consider linear regression and Gaussian graphical model as primary examples. We will describe concave penalized methods which take advantage of partial signal strength, strategies and algorithms of debiasing the Lasso and concave penalized estimators, the sample size requirement for the de-biasing methods to work, and the contributions of unlabeled data in semi-supervised regression.

**Confidence regions in high-dimensional and nonparametric statistical models**

Richard Nickl, Cambridge University

In high-dimensional and nonparametric statistical models, optimal (adaptive) estimators typically require a model selection, dimension reduction or regularisation step, and as a consequence using them for inference is a non-obvious task. In particular, `uncertainty quantification’ -- the construction of adaptive `honest' confidence regions that are valid uniformly in the parameter space may not be straightforward or even impossible. We will explain the main ideas of a decision-theoretic framework (that has emerged in the last 10 years or so) that gives general information-theoretic conditions which allow to check whether honest confidence sets exist or not in a given statistical model, and, when the answer is negative, which `signal strength’ conditions are required to make adaptive inference. These conditions involve the minimax solution of certain composite high-dimensional testing problems, somewhat related to the minimax `signal detection' problem. I will show how the general theory can be applied to several examples, such as sparse or nonparametric regression, density estimation, low rank matrix recovery and matrix completion. We will also describe some concrete uncertainty quantification procedures, Bayesian and non-Bayesian, that can be used in such models.

**Variational Inference: Foundations and Innovations**

David Blei, Columbia University

One of the core problems of modern statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in probabilistic modeling, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this tutorial I review and discuss variational inference (VI), a method a that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and tends to be faster than more traditional methods, such as Markov chain Monte Carlo sampling.

Brought into machine learning in the 1990s, recent advances in improved fidelity and simplified implementation have renewed interest and application of this class of methods. This tutorial aims to provide both an introduction to VI, a modern view of the field, and an overview of the role that probabilistic inference plays in many of the central areas of machine learning.

First, I will provide a broad review of variational inference. This serves as an introduction (or review) of its central concepts. Second, I develop and connect some of the pivotal tools for VI that have been developed in the last few years, tools like Monte Carlo gradient estimation, black box variational inference, stochastic variational inference, and variational autoencoders. These methods have lead to a resurgence of research and applications of VI. Finally, I discuss some of the unsolved problems in VI and point to promising research directions.

**Contributed Speakers**

**Posterior Contraction and Credible Sets for Multivariate Regression Mode with Twostage Improvements**

William Weimin Yoo, Leiden University

Locating the maximum of a function and its size in presence of noise is an important problem. The optimal rates for estimating them are respectively the same as those of estimating the function and all its first order partial derivatives, if one is allowed to sample in one shot only. It has been recently observed that substantial improvements are possible when one can obtain samples in two stages: a pilot estimate obtained in the first stage that guides to optimal sampling locations for the second stage sampling. If the second stage design points are chosen appropriately, the second stage rate can match the optimal sequential rate. In the Bayesian paradigm, one can naturally update uncertainty quantification based on past information and hence the two-stage method fits very naturally within the Bayesian framework. Nevertheless, Bayesian two-stage procedures for mode-hunting have not been studied in the literature. In this talk, we provide posterior contraction rates and Bayesian credible sets with guaranteed frequentist coverage, which will allow us to quantify the uncertainty in the process. We consider anisotropic functions where

function smoothness varies by direction. We use a random series prior based on tensor product B-splines with normal basis coefficients for the underlying function, and the error variance is either estimated using empirical Bayes or is further endowed with a conjugate inverse-gamma prior. The credible set obtained in the first stage is used to mark the sampling area for second stage sampling. We show that the second stage estimation achieves the optimal sequential rate and avoids the curse of dimensionality. This research is joint work with Dr. Subhashis Ghosal of North Carolina State University.

**Efficient semiparametric estimation and model selection for multidimensional mixtures**

Elodie Vernet, Cambridge University

Obtaining theoretical guarantees (such as uncertainty quantification) in the context of parameter estimation may be challenging in mixture models. Note that identifiability is already not trivial in these models. In this presentation, I will discuss efficiency in the context of nonparametric mixture models. More precisely, we consider mixture models where the i.i.d. observations have at least three components which are independent given the population of the observation. We don't assume a parametric modelling of the emission distributions that is the distribution of the observation given its population. And we are interested in the semiparametric estimation of the proportion of each population. Using a discretisation of the problem via projection of the densities in histograms, we obtain an asymptotically efficient estimator. In the Bayesian setting, using a sequence of prior distributions defined on more and more complex sets when the number of observations increases, we show that the associated sequence of posterior distribution verifies a Bernstein von Mises type theorem with efficient Fisher information for the semiparametric problem as variance. These two asymptotic results are true given the complexity of the approximation models don't increase too fast compared to the number of observations. We then propose a cross-validation like procedure to select the complexity of the model in a finite horizon. This proposed procedure satisfies an oracle inequality.

These results are part of a joint work with Elisabeth Gassiat and Judith Rousseau. Reference: https://arxiv.org/abs/1607.05430.

**Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics**

Julyen Arbel, Inria Grenoble Rhône-Alpes

Given a sample of size n from a population of individual belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability D_n(l) that the (n+1)-th draw coincides with a species with frequency l in the sample, for any l=0,1,...,n. We explore in this talk a Bayesian nonparametric viewpoint for inference of D_n(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for the Bayesian nonparametric estimator of D_n(l), and we investigate the large n asymptotic behavior of such an estimator. We also compare this estimator to the classical Good–Turing estimator (joint work with Stefano Favaro (Collegio Carlo Alberto & University of Torino), Bernardo Nipoti (Trinity College Dublin) and Yee Whye Teh (Oxford University)).

**Hierarchical hazard rates for partially exchangeable survival times**

Federico Camerlenghi, Bocconi University

Survival analysis represents one among the first areas of applications of Bayesian nonparametric techniques. A large amount of literature has been developed to model prior distributions of hazard rates for exchangeable, and possibly censored, survival times. Exchangeability corresponds to assuming homogeneity among the data, which is quite restrictive in a large variety of applied problems where data are generated by different experiments. Even if these experiments may be related, they represent a source of heterogeneity that cannot be accommodated for by the exchangeability assumption. Hence, one needs to resort to more general dependence structures. In such situations partial exchangeability is a more suitable assumption. Here we define a novel class of dependent random hazard rates, which work as prior distributions in presence of partially exchangeable survival times. They are expressed as mixtures of kernels with respect to a vector of hierarchical completely random measures, which has the advantage to enable dependence across the diverse groups of observations. We characterize the posterior distribution of the hierarchical completely random measures, which is the key tool to estimate the survival functions through a Markov chain Monte Carlo algorithm. Besides we are able to obtain reliable credible intervals for the estimated quantities developing a novel and efficient Ferguson & Klass–type algorithm, that avoids to marginalize out the infinite–dimensional random elements of the model. Finally we concentrate on some illustrative examples, both real and simulated, to show the benefits of the whole construction (joint work with Antonio Lijoi and Igor Prünster).

**Resampling-Based Inference for the Mann-Whitney Effect for Right-Censored and Tied Data**

Dennis Dobler, Ulm University

In a two-sample survival setting with independent survival variables $T_1$ and $T_2$ and independent right-censoring, the Mann-Whitney effect $p = P(T_1 > T_2) + \frac12 P(T_1 = T_2)$ is an intuitive measure for discriminating two survival distributions. Comparing two treatments, the case $p> 1/2$ suggests the superiority of the first. Nonparametric maximum likelihood estimators based on normalized Kaplan-Meier estimators naturally handle tied data, which are omnipresent in practical applications. Studentizations allow for asymptotically accurate inference for $p$. For small samples, however, coverage probabilities of confidence intervals are considerably enhanced by means of bootstrap and permutation techniques. The latter even yields finitely exact procedures in the situation of exchangeable data.Simulation results support all theoretic properties under various censoring and distribution set-ups.

**Pseudo-Marginal Monte Carlo for the Bayesian Gaussian Process Latent Variable Model**

Charles Gadd, University of Warwick

Gaussian process latent variable models (GPLVMs) can be viewed as a non-linear extension to the dual of probabilistic principal component analysis, where in the dual we instead optimize the latent variables and marginalize the transformation matrix. In recent years these models have emerged as a powerful tool for modelling multi- dimensional data. One variant is the Bayesian GPLVM (BGPLVM) which allows for the additional marginalisation of latent variables using variational Bayes and variational sparse GP regression. We focus on the a further generalization, the dynamic BGPLVM for supervised learning, which incorporates general input information through a GP prior. In GP models we choose to parameterize our kernels with a set of hyperparameters to allow for a degree of flexibility. Having marginalized over the latent space it is common to optimise the variational parameters and hyperparameters simultaneously through maximum likelihood. However, a fully Bayesian model would both infer all parameters and latent variables, plus integrate over them with respect to their posterior distributions to account for their uncertainty when making predictions. Unfortunately it is not possible to obtain these analytically. We may choose to perform this inference using stochastic approximations based on MCMC, but find that the strong coupling between the latent variables and hyperparameters a posteriori provides a challenge when sampling and results in poor mixing. To break this correlation when sampling we propose the use of Pseudo- Marginal Monte Carlo, approximately integrating out the latent variables while retaining the exact posterior distribution over hyper-parameters as the invariant distribution of our Markov Chain and ergodicity properties. This works shows the ability of a fully Bayesian treatment to better quantify uncertainty when compared to the maximum likelihood or other optimization based approaches (joint work with Sara Wade, and Akeel Shah).

**Needles and Straw in a Haystack: Robust Empirical Bayes Confidence for Possibly Sparse Sequences**

Nurzhan Nurushev, VU Amsterdam

In the signal+noise model (the noise is not necessarily independent normals) we construct an empirical Bayes posterior which we then use for \emph{uncertainty quantification} for the unknown, possibly sparse, signal. We introduce a novel \emph{excessive bias restriction} (EBR) condition, which gives rise to a new slicing of the entire space that is suitable for uncertainty quantification. Under EBR and some mild conditions on the noise, we establish the local (oracle) confidence optimality of the empirical Bayes credible ball. In passing, we also get the local optimal results for estimation and posterior contraction problems. Adaptive minimax results (also for the estimation and posterior contraction problems) over sparsity classes follow from our local results.

**Estimation of a two-component mixture model with applications to multiple testing**

Rohit Patra, University of Florida

We consider estimation and inference in a two component mixture model where the distribution of one component is completely unknown. We develop methods for estimating the mixing proportion and the unknown distribution nonparametrically, given i.i.d. data from the mixture model. We use ideas from shape restricted function estimation and develop "tuning parameter free" estimators that are easily implementable and have good finite sample performance. We establish the consistency of our procedures. Distribution-free finite sample lower confidence bounds are developed for the mixing proportion. We discuss the connection with the problem of multiple testing and compare our procedure with some of the existing methods in that area through simulation studies.

#### Registration

Closed

#### About the YES Workshops:

This is the eight workshop in the series of YES (Young European Statisticians) workshops. The first was held in October 2007 on Shape Restricted Inference with seminars given by Lutz Dümbgen (Bern) and Jon Wellner (Seattle) together with shorter talks by Laurie Davies (Duisburg-Essen) and Geurt Jongbloed (Delft). The second workshop was held in October 2008 on High Dimensional Statistics with seminars given by Sara van de Geer (Zürich), Nicolai Meinshausen (Oxford) and Gilles Blanchard (Berlin). The third was held October 2009 on Paradigms of Model Choice, with seminars given by Laurie Davies (Duisburg-Essen), Peter Grünwald (Amsterdam), Nils Hjort (Oslo) and Christian Robert (Paris). The fourth took place in November 2010, with seminars given by Judith Rousseau (Paris), Zoubin Ghahramani (Cambridge), Yongdai Kim (Seoul) and Harry van Zanten (Eindhoven). The fifth workshop took place in October 2011, with seminars by Alexander Goldenshluger (Haifa), Richard Nickl (Cambridge), Laurent Cavalier (University Aix-Marseille 1) and Eduard Belitser, (TU Eindhoven). The sixth workshop was held in January 2013, with tutorials by Martin Wainwright (UC Berkeley), Eric Kolaczyk (Boston University) and Johan Koskinen (University of Manchester). Finally the seventh workshop was held in March 2014, with tutorials by Nicolò Cesa-Bianchi (Università degli Studi di Milano), Francis Bach (INRIA - Paris) and Anatoli Juditsky (Université Joseph Fourier, Grenoble).

#### Financial support

This workshop is generously sponsored by: