- This event has passed.

# YES IV: “Bayesian Nonparametric Statistics”

## Nov 8, 2010 - Nov 10, 2010

#### Sponsored by

#### Summary

Bayesian Nonparametrics is a rapidly growing field at the interface of statistics, probability theory and machine learning. While the Bayes approach is a very general and powerful inference tool (also naturally allowing to include prior information, to treat hierarchies of unknown parameters etc.), its use in nonparametric settings where potentially infinitely many parameters are unknown, turns out to be an exciting and challenging task. At the same time, the recent development of powerful simulation algorithms and the increase of computational power have given a strong impulse to the practical use of Bayesian methods.

The research topics range from the construction of prior distributions on very high dimensional spaces to inference based on the posterior distribution, from establishing consistency of the convergence to the determination of rates and limit shapes of posterior distributions. Recent advances have led to a rapidly growing literature and increasing application of nonparametric Bayesian techniques in a wide range of fields ranging from machine learning to applications in biostatistics and financial mathematics.

The present workshop is directed at young statisticians, in particular Ph.D. students, postdocs and junior researchers, who are interested in the subject of Bayesian Nonparametrics. Mini-courses each consisting of three 45 minute talks on various aspects of Bayesian Nonparametrics will be given.

This is the fourth workshop in the series of YES (Young European Statisticians) workshops. The first was held in October 2007 on Shape Restricted Inference with seminars given by Lutz Dümbgen (Bern) and Jon Wellner (Seattle) together with shorter talks by Laurie Davies (Duisburg-Essen) and Geurt Jongbloed (Delft). The second workshop was held in October 2008 on High Dimensional Statistics with seminars given by Sara van de Geer (Zürich), Nicolai Meinshausen (Oxford) and Gilles Blanchard (Berlin). The third was held October 2009 on Paradigms of Model Choice, with seminars given by Laurie Davies (Duisburg-Essen), Peter Grünwald (Amsterdam), Nils Hjort (Oslo) and Christian Robert (Paris).

#### Organisers

• dr. B.J.K. Kleijn, Korteweg-De Vries Instituut voor Wiskunde, Amsterdam

• dr. I. Castillo, CNRS, Laboratoire Probabilités et Modèles Aléatoires, Paris

• Prof. dr. G. Jongbloed, University of Technology Delft

#### Speakers

**Keynotes**

Prof. Zoubin Ghahramani, Cambridge

Prof. Yongdai Kim, Seoul National University

Prof. Judith Rousseau, Paris-Dauphine University

Prof. Harry van Zanten, TU Eindhoven

**Contributed**

Julyan Arbel

Alexandra Babenko

Eleni Bakra

Dominique Bontemps

Maria Anna Di Lucca

René de Jonge

Soleiman Khazaei

Bartek Knapik

Willem Kruijer

Steffen Ventz

#### Abstracts

MINI COURSES (keynote)

**Zoubin Ghahramani**

**I.** **A Brief Overview of Nonparametric Bayesian Models **

The flexibility of nonparametric Bayesian (NPB) methods for data modelling has generated an explosion of interest in the last decade in both Statistics and Machine Learning communities. I will give an overview of some of the main NPB models, and focus on the relationships between them. Focusing on the Dirichlet process (DP) and its relatives, I plan to give a whirlwind tour of the DP and Beta process, the associated Chinese restaurant and Indian buffet, hierarchical models such as Kingman’s coalescent, the Dirichlet diffusion tree, and the Hierarchical DP (HDP), times series models such as the infinite HMM, dependent models such as the depedent Dirichlet process, and other topics such as completely random measures and stick-breaking constructions, time permitting. I will also try to give an overview of inference methods for NPB models (MCMC and alternatives).

**II. Gaussian processes**

Gaussian processes (GPs) are a fundamental stochastic process with a long history in Statistics and Machine Learning. GPs define distributions over unknown functions. and offer an elegant framework for Bayesian supervised kernel regression and classification, providing estimates of the uncertainty on quantities of interest and a principled framework for automatic feature selection and for learning the parameters of the kernel. I will give a tutorial on GPs, with Matlab demos, and discussion of the relation to support vector machines (SVMs). The tutorial will be loosely based on material and software from the textbook , “Gaussian Processes for Machine Learning” by Rasmussen and Williams (2006), freely available online and a must-have on every machine learning and statistics researcher’s “bookshelf”.

**III. Indian Buffet processes**

Much work in nonparametric Bayesian statistics focuses on the Dirichlet process (DP) and its associated combinatorial object, the Chinese restaurant process (CRP). The DP and CRP have found important uses in mixture modelling, allowing inference in models with a countable but unbounded number of mixture components. In analogy to CRPs, we have recently developed the Indian buffet process (IBP) which defines probability distributions on sparse binary matrices with exchangeable rows and an unbounded number of columns. The IBP makes it possible to define and do inference in models with an unbounded number of latent variables. I will review properties of the IBP, inference algorithms, and a number of applications, including: sparse latent factor and independent components models, time series models with an unbounded number of hidden processes, and nonparametric matrix factorisation models. Time permitting, I will describe recent extensions of the IBP.

**Yongdai Kim**

**Bayesian Survival analysis
**

**I. Model, Prior and Posterior**

Bayesian analysis for censored data is reviewed. Firstly, Bayesian inference of the survival function with right censored data is considered. Neutral to right process priors are introduced and the corresponding posteriors are derived. Secondly, Bayesian analysis of the cumulative hazard function is reviewed. In particular, beta processes are explained and shown to be conjugate. Also, Bayesian analysis of counting processes are considered. Thirdly, mixture priors are introduced and the corresponding posteriors are derived. Fourthly, Bayesian analysis of the proportional hazards model is considered. In particular, the propriety of the posterior distribution with the uniform flat prior on the regression coefficients is explained. Finally, if time is allowed, Indian buffet processes are introduced and their extensions are discussed.

**II. Asymptotics**

Large sample properties of the posterior distribution of the survival function with right censored data are reviewed. For priors, a class of neutral ro right processes is considered. Firstly, it is shown that posterior consistency does not hold for all neutral to right process

priors and sufficient conditions for the posterior consistency are given. Most of popularly used priors including Dirichlet process, beta processes and gamma processes are shown to be consistent. Secondly, the Bernstein-von Mises theorem is discussed. A class of extended beta processes is considered and necessary and sufficient conditions for the Bernstein-von Mises theorem are given. Then, sufficient conditions for the Bernstein-von Mises theorem for general neutral to right process priors are explained. Thirdly, the Bernstein-von Mises

theorem of the proportional hazards model is proved. Fourthly, the Bernstein-von Mises theorem with doubly censored data is explained.

**III. Computations and future works
**Various computational algorithms for Bayesian survival analysis are explained. Firstly, an MCMC algorithm with Dirichlet process prior is explained with complicated censoring data including interval censored data. Secondly, methods of generating sample paths of neutral to right process priors are explained and applications to MCMC algorithms are discussed. Thirdly, an MCMC algorithm for the proportional hazards model is considered. Fourthly, a Bayesian bootstrap approach and its application to approximation of the posterior distribution are explained. Finally, some possible future researches for Bayesian survival analysis are discussed.

**Judith Rousseau**

**I. Concentration properties of the posterior distribution for nonparametric mixture models
**In this course we introduce general classes of Bayesian nonparametric mixture models for the density of observations. We give general results on weak consistency of the posterior distribution under these models. A sufficient condition to obtain posterior consistency is the famous Kullback-Leibler

condition and we will discuss such conditions in a general setup (Wu and Ghosal). Then we discuss cases where the Kullback-Leibler condition is not satisfied but which still lead to consistency of the posterior. In the second part of this course we discuss rates of concentration of the posteriors under location-scale mixture models or Beta mixture models. We

explain in particular why such models lead naturally to minimax adaptive concentration rates (and thus to minimax adaptive estimators).

**II.** **Semi-parametric Bayesian estimation
**In this course we concentrate on semi-parametric models. Let X be a set of observations whose distribution is governed by a parameter λ ∈ Λ where Λ is infinite dimensional but the object of interest is a functional of λ, say ψ(λ) which belongs to finite dimensional space. Two cases typically occure : (1) separated : λ = (ψ, η) and the object of interest is ψ ∈ Rk whereas η is inifinite dimensional; (2) functional : ψ : Λ → Rk. A typical example of the first case is the partially linear model and a typical example of the second case is the estimation of the cumulative distribution function at a given point. In this course we give some recent results on the Bernstein – von Mises

property of the posterior distribution on these two setups of semi-parametric analysis. The Bernstein – von Mises property of the posterior distribution means that the posterior distribution of the object of interest (here ψ) is asymptotically Gaussian with mean ψˆ and variance Vn where ψˆ is a quantity depending on the observations such that its asymptotic distribution under Pλ0 (frequentist) is also Gaussian with mean ψ(λ0) and variance V . Such results have many practical interests since they imply in particular that

Bayesian credible regions on ψ (such as HPD regions) are also frequentist confidence regions.

**III.** **Asymptotic behaviour of the Bayes factor in nonparametric tests
**In the course we consider goodness of fit tests in the form :

H0 : f ∈ F0 = {fθ, θ ∈ Θ}, H1 : f /∈ F in a model X = (X1, …, Xn) ∼ P n f where f belongs to a functional space F. Any Bayesian approach to such a problem requires the construction of a prior under H0, say π0 on Θ and a prior under H1, say π1 on F, the latter being typically nonparametric and such that π1(F0) = 0. The most common Bayesian answer to such a problem is the Bayes factor, which can be written as

B0/1 = ! Θ Pfθ (X)dπ0(θ) ! F Pf (X)dπ1(f)

In this course we discuss some issues regarding the consistency of the posterior, which is defined by B0/1 → +∞ if X ∼ Pfθ and n goes to infinity (under H0) and B0/1 → 0 if X ∼ Pf and f /∈ F0. Conditions will be given to ensure consistency both in the cases where Θ is a singleton and where Θ is a parametric model, but some conditions will also be presented under which the Bayes factor is not consistent.

**Harry van Zanten**

**Asymptotic theory for Gaussian process priors**

**I. Introduction**

In the first lecture we give a brief general introduction to Bayesian nonparametrics, focussing on the use of Gaussian process priors. We give examples, show how such priors can be used in varous statistical settings, and briefly comment on computational issues.

**II. Contraction rates for Gaussian process priors**

In the second lecture we discuss the asymptotic behaviour of posterior distributions corresponding to Gaussian process priors. In particular, we explain how contraction rates are connected to the so-called small deviations behaviour of the prior and to the approximation of the function of interest by elements of the so-called RKHS of the prior. We compute contraction rates for several concrete Gaussian priors.

**III. Contraction rates for Gaussian process priors**

In the third lecture we consider the use of Gaussian process priors in hierarchical Bayes procedures. In particular, we explain how adaptive Bayes procedures can be constructed using conditionally Gaussian priors.

CONTRIBUTED SPEAKERS

**Julyan Arbel**

**Bayesian optimal adaptive estimation using a sieve prior
**We study the Bayes estimation of an infinite-dimensional parameter \theta. We propose a family of sieve priors and prove that the resulting Bayes estimators are adaptive optimal, both in posterior concentration rate and in risk convergence for the L2-norm loss. This result is applied to several models: density, regression and white noise. We prove that the same procedure is not optimal nor adaptive for the pointwise L2-norm loss and give a lower bound for the rate.

**Alexandra Babenko**

**Oracle Posterior Rates in the White Noise Model
**We apply a Bayesian approach to the problem of estimating a signal observed in the White noise model and we study the rate at which the posterior distribution, the main quantity of interest in Bayesian analysis, concentrates around the true value of the signal. A new benchmark for the posterior concentration rate, the so called posterior oracle rate, is proposed and studied. This is the smallest possible rate over a family of posterior rates corresponding to an appropriately chosen family of priors. To complement the upper bound results on the posterior concentration rate, we establish a lower bound result for the oracle rate. We also study implications for the model selection problem and present some simulations.

**Eleni Bakra**

**Applying spline smoothing regression on repeated measurements to link different cohorts
**Measurement of changes over the complete human lifespan is complex and no one study can expect to represent the complete population even if started at birth. Therefore, techniques need to be used that can draw on information from a range of studies to better understand the complete age range. However, combining studies has challenges to ensure that the study variability and differences can be properly modelled. Data from different cohorts across the life course are combined drawing together random effects and smoothing techniques to link different sources. The aim is to model between individual heterogeneity whilst producing a smoothed line across the life span by using a study where blood pressure measures from three cohorts representing the entire life span have been collected.

**Bernstein-von Mises Theorems for Gaussian regression with increasing number of regressors
**This work brings a contribution to the Bayesian theory of nonparametric and semiparametric estimation. We are interested in the asymptotic normality of the posterior distribution in Gaussian linear regression models when the number of regressors increases with the sample size. Two kinds of Bernstein-von Mises Theorems are obtained in this framework: nonparametric theorems for the parameter itself, and semiparametric theorems for functionals of the parameter. We apply them to the Gaussian sequence model and to the regression of functions in Sobolev and $C^{\alpha}$ classes, in which we get the minimax convergence rates. Adaptivity is reached for the Bayesian estimators of functionals in our applications.

**A Non-parametric Bayesian Autoregressive Model for DNA-sequencing
**We consider the problem of base calling for data from high through-put sequencing (HTS) experiments. We propose a non-parametric Bayesian approach. The proposed model generalizes earlier approches based on mixtures of normals to mixtures of random probability measures. Complication arises from inherently autoregressive nature of the data (phasing). We use a variation of dependent Dirichlet process models (DDP) that define a non-parametric vector autoregressive model for the four-dimensional output from the four channels of the sequencing experiment.

**Adaptive Bayesian estimation using tensor product splines
**In this talk I present a nonparametric procedure based on tensor product splines with Gaussian coefficients. The corresponding prior is conditionally Gaussian and (thus) provides a unified approach for a variety of statistical settings such as density estimation and regression. The goal is to show the procedures adapts to the true smoothness and to determine the rate of posterior contraction around the truth.

**Nonparametric Bayesian estimation of densities under monotonicity constraint
**In this paper we discuss consistency of the posterior distribution in cases where the Kullback-Leibler condition is not veried. This condition is stated as : for all \ep > 0 the prior probability of sets in the form {f ; K L(f0, f ) < \ep} where K L(f0, f ) denotes the Kullback-Leibler divergence between the true density f0 of the observations and the density f , is positive. This condition is in almost cases required to lead to weak consistency of the posterior distribution, and thus to lead also to strong consistency. However it is not a necessary condition. We therefore present a new condition to replace the Kullback-Leibler condition, which is usefull in cases such as the estimation of decreasing densities. We then study some specifc families of priors adapted to the estimation of decreasing densities and provide posterior concentration rate for these priors, which is the same rate a the convergence rate of the maximum likelihood estimator. Some simulation results are provided.

**Bayesian Inverse problems
**In this talk I will propose Bayesian approach to inverse problems with Gaussian white noise, based on Gaussian priors. I will focus on two aspects of inverse problems – estimation of the full parameter of interest and linear functional of it. For the ease of the presentation I will talk about so-called mildly ill-posed inverse problems, although the presented theory can be easily adapted to other various settings. Both in nonparametric and linear functional case, the rate of the contraction of the posterior distribution around the truth con be computed. Moreover, under some conditions on the prior Bayesians can construct credible sets that coincide with frequentists’ confidence regions. The additional result in the linear functional setting is Bernstein-von Mises theorem, which under suitable conditions on the linear functional and the prior shows that the centered posterior for the linear functional of the truth and the asymptotic distribution of asymptotically efficient estimator centered at the truth are close in total variation norm.

**On Bayesian estimation of the long-memory parameter in the FEXP-model for Gaussian time series **

Given observations X1, . . . , Xn from a zero mean stationary Gaussian time series, we estimate its spectral density f0 within the FEXP-model, which contains spectral densities of the form

|1 − e ix| −2d exp X k j=0 θj cos(jx). The FEXP-part exp{Pk

j=0 θj cos(jx)} is a nonparametric model for the shortmemory

behavior of the time series, whereas the factor |1−e ix| −2d models its long-memory behavior. We study the semi-parametric problem of estimating the long-memory parameter d, considering θ as a nuisance parameter. The true spectral density f0 is assumed have long-memory parameter d0

and an (infinite) FEXP-expansion of Sobolev-regularity β > 1. The related nonparametric problem of estimating f0 itself has recently been studied by Rousseau, Chopin and Liseo (2010). They proved that under a Poisson or geometric prior on the dimension of the FEXP-expansion, the posterior concentrates

around f0 at almost the minimax rate of n − β 1+2β . It is however unclear whether this also implies the minimax rate for the marginal posterior for d, which is known to be n − 2β−14β . We first prove that this is not the case under the Poisson or geometric prior for k. We give a counterexample,

showing that the rate is at best n − 2β−1 4β+2 . On the other hand, if a sieve prior k ∼ n 1 2β is used, we obtain the minimax rate for d.

**Inference for non-homogeneous Markov Chains using Reinforced Urn Processes, with application to heart transplantation monitoring
**Reinforced Urn processes (RUP) are a class of reinforced random walks on a countable state space of Polya Urns. Under suitable recurrence conditions, the RUP can be represented as a unique mixture of Markov chains,

with known mixing measure. We construct a particular class of RUP on a countable state-space and provide sufficiency conditions for recurrence. Under recurrence assumptions, we use the unique mixture representation of the RUP and the exchangeability of the X0-Blocks to induce a nonparametric prior on the space of stochastic transition arrays for non-homogeneous Markov Chains. Our process can be used for Multi-State longitudinal problems, where the dependence through time might be Markovian but not necessarily time homogeneous. Potentially several individuals are repeatedly observed through time. Individuals itself may be judged to be exchangeable, whereas for a fixed individual the repeated measurements are assumed to follow a non-homogeneous Markovian structure, conditional on an unknown transition array. Exact posterior estimates for the unknown transition array and functionals of the transition array can be obtained trough the predictive distribution of the RUP. We apply the constructed RUP to the problem of heart transplantation monitoring.