- This event has passed.
YES V: Adaptation in Nonparametric Statistics
Oct 10, 2011 - Oct 12, 2011
Summary
The quality of statistical inference essentially depends on how complex we assume the underlying statistical model to be: generally, the richer the model, the worse the quality of statistical inferences. On the other hand, if the proposed model is too simple, it may not be able to provide a reasonable fit to the data. In an adaptive setup, instead of one particular model one deals with a family of models, often ordered or embedded from simple to complex. Depending on the statistical problem at hand (for instance, regression function estimation, testing hypothesis, confidence set), the general problem of adaptation is, loosely formulated, to design a so called adaptive method for solving this statistical problem which performs in multiple model situation as good as in a single model, or, if this is not possible, with the smallest loss of quality.
In the last two decades, several adaptive methods (optimal in one or other sense) have been developed: cross-validation, blockwise method, Lepski’s method, wavelet thresholding, penalized estimators etc. Most of the adaptive methods are for the estimation problems.
There is not much known yet about adaptive confidence sets, how to define and to construct an optimal adaptive confidence set seems to be a subtle issue. Since recently, this topic is of increasing interest in the statistical community.
To compare different statistical procedures, a developed machinery for optimality considerations within the minimax framework was built up in 80’s. In the meantime, a new approach to optimality have been developed, namely, the oracle approach. The interplay between these two approaches, minimax and oracle, is one of the main topics of this workshop.
The present workshop is directed at statisticians, in particular Ph.D. students, postdocs and junior researchers, who are interested in the subject of adaptation in nonparametric models.
Sponsors
Organisers
Angelika Rohde | Universitat Hamburg |
Eduard Belitser | TU Eindhoven |
Geurt Jongbloed | TU Delft |
Speakers
Alexander Goldenshluger | University of Haifa |
Richard Nikl | University of Cambridge |
Laurent Cavalier | University Aix-Marseille 1 |
Eduard Belitser | TU Eindhoven |
Programme
Monday October 10
9:30-10:30 | Registration | |
10:30-10:45 | Welcome | Connie Cantrijn |
10:45-11:30 | Alexander Goldenshluger | Introduction to adaptive nonparametric estimation by selection of estimators |
11:35-12:20 | Laurent Cavalier | Inverse problems in statistics |
12:30-13:30 | LUNCH | |
13:30-14:15 | Eduard Belitser | Oracle approach, interplay with minimax adaptation |
14:20-14:40 | Christoph Breunig | Adaptive estimation of functionals in nonparametric instrumental regression |
14:45-15:05 | Maik Schwarz | Adaptive estimation in a Gaussian sequence model |
15:15-15:45 | Coffee/tea break | |
15:45-16:30 | Laurent Cavalier | Oracle inequalities in inverse problems |
16:35-16:55 | Itai Dattner | On deconvolution of distribution functions |
Tuesday October 11
9:00-9:45 | Richard Nickl | Confidence sets in nonparametric statistics |
9:50-10:10 | Jakob Söhl | Confidence sets in nonparametric calibration of exponential Lévy models |
10:15-10:45 | Coffee/tea break | |
10:45-11:30 | Alexander Goldenshluger | General procedure for selecting linear estimators |
11:35-11:55 | Claudia Strauch | Sharp adaptive drift estimation in multidimensional diffusion models |
12:00-12:20 | Rudolf Schenk | Adaptive local functional linear regression |
12:35-13:30 | LUNCH | |
13:30-14:15 | Eduard Belitser | Lower bounds, Bayesian model selection and adaptive confidence sets |
14:20-14:40 | Catia Scriccolo | Adaptive Bayesian density estimation using Dirichlet process Gaussian mixture priors |
14:45-15:05 | Botond Szabo | Understanding the asymptotic behaviour of the empirical Bayes method |
15:15-15:45 | Coffee/tea break | |
15:45-16:30 | Richard Nickl | Adaptive confidence sets I – confidence bands |
16:35-16:55 | Adam Bull | Honest adaptive confidence bands |
18.30 – | Conference dinner |
Wednesday October 12
9:00-9:45 | Alexander Goldenshluger | Aggregation of estimators |
9:50-10:35 | Laurent Cavalier | Risk hull method |
10:45-11:15 | Coffee/tea break | |
11:15-12:00 | Richard Nickl | Adaptive confidence sets II |
12:00-12:30 | Discussion | |
12:30 | Closing |
Abstracts
Eduard Belitser (TU Eindhoven)
Talk I: Oracle approach, interplay with minimax adaptation
A classical approach to optimality considerations in adaptation problems is via minimax framework. There is a way to look at the adaptation problem from another perspective, namely, by using the so called oracle approach. We introduce the notions of oracle inequality, oracle risk, oracle estimator and describe the oracle approach in a general setting. We also discuss the interplay between the oracle and minimax frameworks. The main message here is that, loosely speaking, an oracle inequality result implies adaptive minimaxity results over all functional classes which are covered by the (not too rich) family of estimators, acting in the oracle inequality. We also describe two Bayesian approaches to the adaptation problem: pure Bayes and empirical Bayes approaches.
Talk II: Bayesian adaptation, oracle posterior rates
We introduce an oracle optimality framework for the Bayesian approach. A statistical model and a family of priors determines the corresponding family of the posterior rates. The oracle prior corresponds to the best posterior rate (which is called posterior oracle rate). Our goal is to design an adaptive prior to mimic the performance of the oracle prior. We apply the Bayesian oracle approach to the problem of projection estimation of a signal observed in the Gaussian white noise model. A proposed family of priors models the projection estimation oracle in the sense that the family of resulting posterior rates essentially coincides with the family of risks of the projection estimators. Under an appropriate hierarchical prior, we study the performance of the resulting (appropriately adjusted by the empirical Bayes approach) posterior distribution and establish that the posterior concentrates about the true signal with the oracle projection convergence rate.
Talk III: Lower bounds, Bayesian model selection and adaptive confidence sets
We complement the upper bound results on the posterior rate from the second talk by a lower bound result for the oracle posterior rate. When applying the Bayesian approach to adaptation problems, besides the original statistical inference problem, one can consider an attendant problem of a data-based choice for structural parameter which marks the model. One can thus regard this attendant problem as the model selection problem. We study implications of the results form the second talk for the model selection problem, namely, we propose a Bayes model selector and assess its quality in terms of the so-called false selection probability. At the end we touch upon the problem of construction of adaptive confidence sets by using a Bayesian approach.
Christoph Breunig (Universität Mannheim)
Adaptive estimation of functionals in nonparametric instrumental regression
We consider the problem of estimating the value l(g) of a linear functional, where the structural function g models a nonparametric relationship in presence of instrumental variables. We propose a plug-in estimator which is based on a dimension reduction technique and additional thresholding. It is shown that this estimator is consistent and can attain the minimax optimal rate of convergence under additional regularity conditions. This, however, requires an optimal choice of the dimension pa- rameter m depending on certain characteristics of the structural function g and the joint distribution of theregressor and the instrument, which are unknown in practice. We propose a fully data driven choice of m which combines model selection and Lepski’s method. We show that the adaptive estimator attains the optimal rate of convergence up to a logarithmic factor. The theory in this paper is illustrated by considering classical smoothness assumptions and we discuss examples such as pointwise estimation or estimation of averages of the structural function g.
Laurent Cavalier (University Aix-Marseille I)
Talk I: Inverse problems
There exist many fields where inverse problems appear. Some examples are: astronomy (blurred images of the Hubble satellite), econometrics (instrumental variables), financial mathematics (model calibration of the volatility), medical image processing (X-ray tomography) and quantum physics (quantum homodyne tomography). These are problems where we have indirect observations of an object (a function) that we want to reconstruct, through a linear operator $A$. One needs regularization methods in order to get a stable and accurate reconstruction. We present the framework of statistical inverse problems where the data are corrupted by some stochastic error. This white noise model may be discretized in the spectral domain using Singular Value Decomposition (SVD), when the operator $A$ is compact. Several examples of inverse problems where the SVD is known are presented (circular deconvolution, tomography). We explain some basic issues regarding nonparametric statistics applied to inverse problems. Standard regularization methods are presented (projection, Landweber, Tikhonov,…).
Talk II: Adaptation and oracle inequalities in inverse problems
Several classical statistical approaches like minimax risk and optimal rates of convergence, are presented. Optimal rates of convergence are given for estimating functions in the Sobolev and analytic classes of functions. The notion of optimal rate of convergence leads to some optimal choice of the tuning parameter. However these optimal parameters are unachievable since they depend on the unknown smoothness of the function. This leads to more recent concepts like adaptive estimation and oracle inequalities. A data-driven selection procedure of the regularization parameter based on Unbiased Risk Estimation (URE) is presented. Oracle inequalities are obtained for this specific data-driven selection procedure.
Talk III: Risk hull method
We consider the Gaussian white noise model in inverse problems where $A$ is a known compact operator with singular values converging to zero with polynomial decay. The unknown function $f$ is recovered by a projection method using the SVD of $A$, method also called truncated SVD or spectral cut-off. The bandwidth choice $N$ of this projection regularization is governed by a data-driven procedure which is based on the principle of the risk hull minimization (RHM). This new method may be presented as a penalized empirical risk minimization with a penalty slightly stronger than the usual URE (or Akaike) penalty. We provide oracle inequalities for the mean square risk of this method and we show, in particular, that in numerical simulations, this approach may substantially improve the classical method of unbiased risk estimation.
PRESENTATION 1 PRESENTATION 2 PRESENTATION 3
Adam Bull (Cambridge University)
Honest adaptive confidence bands
Confidence bands are confidence sets for an unknown function , containing all functions within some sup-norm distance of an estimator. We consider the problem of constructing adaptive confidence bands, whose width contracts at an optimal rate over a range of Hölder classes. While adaptive estimators exist, in general adaptive confidence bands do not, and to proceed we must place further assumptions on. We discuss previous approaches to this issue, and show it is necessary to restrict to fundamentally smaller classes of functions. We then consider the self-similar functions, whose Hölder norm is similar at large and small scales. We show that such functions may be considered typical functions of a given Hölder class, and that the assumption of self-similarity is both necessary and sufficient for the construction of adaptive bands. Finally, we show that this assumption allows us to resolve the problem of undersmoothing, creating bands which are honest simultaneously for functions of any Hölder norm.
Itai Dattner (Eurandom)
On deconvolution of distribution functions
It is well known that rates of convergence of estimators in deconvolution problems are affected by the smoothness of the error density and the density to be estimated. However, the problem of distribution deconvolution is more delicate than what was considered so far. We derive different rates of convergence with respect to the tail behavior of the error characteristic function. We present optimal in order deconvolution estimators, both for known and unknown error distribution. An adaptive estimator which achieves the optimal rates within a logarithmic factor is developed. Simulation studies comparing the adaptive estimator to other methods are presented and support the superiority of our method. An example with real data is also discussed. Based on joint works with Alexander Goldenshluger and Benjamin Reiser.
Alexander Goldenshluger (University of Haifa)
Talk I: Introduction to adaptive nonparametric estimation by selection of estimators
In the first talk we survey the problem of adaptive nonparametric estimation of a univariate regression function. The objective is to construct an optimal in the minimax sense estimator that does not require any prior information on the smoothness of the regression function. Our discussion will concetrate on the methods based on selection of linear estimators from a given collection. These methods originate in the works by Oleg Lepski in early 90-ies. We present corresponding minimax and adaptive minimax results and discuss different extensions.
Talk II: General procedure for selecting linear estimators
The second talk deals with adaptive estimation of multivariate functions from noisy observations. We present ageneral selection procedure and derive oracle inequalities for the risk of the selected estimator. The proposed selection procedure leads to adaptive minimax estimatorsin a wide variety of estimation settings. In particular,the resulting estimators can adapt both to unknown smoothness and structure of the function to be estimated.
Talk III: Aggregation of estimators
The third talk is about the aggregation problem. The goal is, based on the noisy observations, to select an estimator from a fixed collection of arbitrary estimators so that the accuracy of the selected estimator is as close as possible to the accuracy of the best estimator in the collection. We present an aggregation scheme that applies to families of arbitrary estimators; it is easily extended to different models and global accuracy measures. We derive oracle inequalities and show that they cannot be improved in the minimax sense.
Richard Nickl (University of Cambridge)
Talk I: Confidence Sets in Nonparametric Statistics
We shall give a broad introduction to the confidence problem in nonparametric statistics, starting with classical results due to Kolmogorov, Smirnov, Bickel and Rosenblatt. The focus will be on nonparametric regression, density and distribution function estimation, and we will explain some of the mathematical machinery that is needed in the theory, mostly from empirical process theory.
Talk II: Adaptive Confidence Sets I — Confidence Bands
We shall discuss the important topic of nonparametric confidence bands, and explain the theory of adaptation in this case, where the size of the confidence set is measured in uniform norm. We shall review the classical ‘negative’ results due to Low and some very recent developments in this field that introduce a ‘separation’ approach to confidence sets.
Talk III: Adaptive Confidence Sets II
We shall consider the existence of adaptive confidence sets in the most commonly used loss-function in nonparametrics, $L^2$-loss (MISE). We shall review some nice results from the last decade, discuss why the situation is substantially different than it is for confidence bands, and then give a unified explanation of when adaptive confidence sets exist by linking the ‘geometry’ of the problem with nonparametric testing theory.
Rudolf Schenk (Université catholique de Louvain)
(joint work with Jan Johannes)
Adaptive local functional linear regression
We consider the estimation of the value of a linear functional of the slope parameter in functional linear regression, where scalar responses are modeled in dependence of random functions. Johannes and Schenk [2010] propose a plug-in estimator which is based on dimension reduction and additional thresholding and show that this estimator can attain the minimax optimal rate of convergence up to a constant. However, this estimation procedure requires an optimal choice of the dimension parameter with regard to certain characteristics of the slope function and the covariance operator of the regressor. As these are unknown in practice, we investigate a fully data-driven choice of the dimension parameter using a variation of the classical model selection approach. The construction of the proposed estimator involves both an estimated penalized contrast function and an estimated collection of models. We show that this adaptive procedure attains the lower bound for the minimax risk up to a logarithmic factor over a wide range of classes of slope functions and covariance operators. In particular, our theory covers point-wise estimation as well as the estimation of local averages of the slope parameter.
Maik Schwarz (Université catholique de Louvain)
(joint work with Jan Johannes)
Adaptive Estimation in a Gaussian Sequence Model
Catia Scricciolo (Bocconi University, Italy)
Adaptive Bayesian density estimation using Dirichlet process Gaussian mixture priors
We consider Bayesian nonparametric estimation of smooth densities using infinite Gaussian mixtures. The posterior distribution corresponding to a Dirichlet process Gaussian mixture prior is shown to shrink around the data-generating distribution at a minimax optimal rate, up to a logarithmic factor, for any smoothness degree of the sampling density. Thus, the corresponding Bayes’ estimator is fully rate adaptive.
Jakob Söhl (Humboldt-Universität zu Berlin)
Confidence sets in nonparametric calibration of exponential Lévy models.
In this talk we consider statistical inference for exponential Lévy models. We consider Lévy processes with a jump component of finite intensity and absolutely continuous jump distribution. In the estimation method the exponent of the Lévy-Khintchine representation is estimated first and then the diffusion coefficient, the drift and the Lévy measure are estimated. The estimators are based on a cutoff-scheme in the spectral domain. To analyze the asymptotic distribution of the estimators we simplify the observation scheme and work with continuous observations given by the Gaussian white noise model. We show that the estimators of the diffusion coefficient, the drift and the jump intensity are asymptotically normally distributed. We also derive asymptotic normality for the pointwise estimation of the Lévy density and study the joint distribution of these estimators. Together with the choice of undersmoothing cut-off values, these results on the asymptotic normality of the estimators allow us to construct confidence intervals and confidence sets.
Claudia Strauch (Universität Hamburg)
Sharp adaptive drift estimation in multidimensional diffusion models
We consider the problem of adaptively estimating the drift function of a multivariate ergodic diffusion. Exact adaptive estimation procedures are proposed, both for global and pointwise estimation. The sharp results in particular reflect the influence of the diffusion matrix on the problem of drift estimation. We briefly discuss the problem under specific additional constraints like a single index structure and indicate its behavior in higher dimension.
Botond Szabo (TU/e)
(Aad van der Vaart and Harry van Zanten)
Understanding the asymptotic behaviour of the empirical Bayes method
In recent years there has been a huge increase in the use of Bayesian methods in high-dimensional or nonparametrical statistical problems. One very popular adaptive Bayesian technique is the empirical Bayes method. The empirical Bayes method is widely used in practice, for example in ecology, genomic data analysis, high dimensional classifications, revenue sharing and quality assurance. Although it has a wide area of applications, the technique itself does not have a full theoretical underpinning. In my talk I will aim to contribute in the fundamental understanding of this widely used method. In Bayesian nonparametrics it is well known that the performance of a statistical procedure depends crucially on the choice of the prior distribution. Wrong choices can result in a posterior distribution that does not concentrate around the “true” parameter, or that does contract, but at a sub-optimal rate. A common approach that helps to avoid this problem is to work with a whole family of prior distributions, indexed by one or more scaling parameters. Popular adaptive methods for choosing the appropriate values of these hyperparameters are full, hierarchical Bayes procedures and empirical Bayes methods. We study the latter approach in the context of the Gaussian white noise model and compare its performance to an oracle procedure that uses the optimal, deterministic scaling that yields the minimax rate of convergence. We
prove that in some cases the empirical Bayes method matches the performance of the oracle, while in other cases it gives a significantly worse contraction rate than the oracle.