- This event has passed.

# YES II: “High dimensional statistics”

## Oct 6, 2008 - Oct 8, 2008

#### Sponsor

#### Summary

The workshop will be the second YES workshop. The intention of the workshop is to acquaint young statisticians (Ph.D students and post-docs) with the latest developments in the analysis of high dimensional data. Research in this area has been driven by the needs of various disciplines which have been confronted by large data sets. The availability of such data is one consequence of advances in biology, medicine, engineering, astronomy, physics and chemistry but also of the ready availability of computing power. Short courses each consisting of three 45 minutes talks will be given

#### Organizer

**Prof. P. L. Davies,** University of Duisburg-Essen, Germany/Eindhoven University of Technology, Eindhoven/EURANDOM, Eindhoven

#### Speakers

**Tutorial speakers**

**Gilles Blanchard**, Berlin

“PAC-Bayes bounds in learning theory, randomized algorithms, some recent related results, and relations to multiple testing, examples of applications”

**Nicolai Meinshausen**, Oxford

“Sparsity, consistent variable selection for high-dimensional data under $L_1$–penalties, underlying assumptions and violation of the same, running application on climate change”

**Sara van de Geer,** Zürich

“Empirical process theory, concentration and contraction inequalities, bounds for empirical risk minimizers, non-standard asymptotic distribution theory, classification problems, general oracle inequalities”

**Invited speakers**

**A. Alfons** (Vienna University of Technology)

“Complex Survey Data Sets: Visualization of Missing Values in R”

Complex survey data sets typically contain a considerable amount of missing values. Deleting whole columns or rows of data matrices that contain missing values would result in a loss of important information. Therefore, normal practive is to use imputation methods to replace the missings by meaningful values.Visualization tools that show information about missings may not only help to identify the mechanisms generating the missing values, but also to gain insight on the quality and various other aspects of the underlying data. This knowledge is necessary for selecting an appropriate imputation method in order to reliably estimate the missing values. In this talk, serveral visualization tools for exploring the data and the structure of the missing values will be presented. These tools are implemented in the R-package VIM (Visualization and Imputation of Missing values). A graphical user interface allows an easy handling of the plot methods. This work was partly funded by the European Union (represented by the European Commission) within the 7th framework programme for research (Theme 8, Socio-Economic Sciences and Humanities, Project AMELI (Advanced Methodology for European Laeken Indicators), Grant Agreement No. 217322).

**G. Blanchard** (Fraunhofer FIRST (IDA)

“From FDR control in multiple testing to randomized estimation”

The multiple testing problem has a long history in statistics, but has enjoyed a new surge of interest in the last 10-15 years due to the the ubiquitous availability of high-dimensional data, where the number of hypotheses to be tested increases linearly or polynomially with the dimension. In the main part of the course I will revisit some of the now considered standard methods for multiple testing under the False Discovery Rate (FDR) control condition, by presenting new simple technical tools which allow to have a somewhat unified point of view on some of these methods. I will also use the same tools to deal with so-called “adaptive” procedures. Finally, in the last part of the course I will switch the subject somewhat and explain how some of the ideas used in the context of multiple testing also find application in the domain of randomized estimation, leading to some insight on the theory of so-called “PAC-Bayes” bounds, in particular with applications to machine learning.

**H. Cho** (University of Bristol)

“Detecting breakpoints in piecewise stationary AR processes”

We introduce a method for detecting breakpoints in piecewise stationary autoregressive processes, which is a variation on the theme of the Dantzig selector (Candes and Tao, 2007). We are motivated by the sparsity of the breakpoints over the observation period. Our method constructs a function with minimum total variation and bounded estimated residuals for each pre-estimated autocovariance sequence of the time series; the jumps in the function indicate likely locations of the changepoints of the process.

This type of approach is common in linear regression problems where it is known that solutions with the minimum l1 norm approximate sparse solutions (Donoho, 2006). Our method can be formulated as a computationally feasible linear programme and theoretical results show consistency in probability.

Simulations complement our study.

(joint work with Piotr Fryzlewicz)

**J. Dony** (Free University of Brussels)

“An empirical process approach to uniform in bandwidth consistency of kernel-type estimators”

Nonparametric density and regression estimation has been the subject of intense investigation for many years and this has led to a large number of methods. One very well-known and commonly used class of estimators consists of the so-called kernel–type estimators.

Contrary to the choice of the kernel, the choice of the bandwidth is more problematic, as it is responsible for an important bias/variance trade-off of the resulting kernel-type estimator. Typically, the bandwidth that is most appropriate will vary according to the situation and will depend on the available data. This means that one can no longer investigate the behavior of such “optimal” estimators (i.e. based upon data-dependent bandwidth sequences) via the classical results for estimators based upon deterministic bandwidth sequences.

In this talk, we will show how the theory of empirical processes can be used to prove (optimal) “uniform in bandwidth” consistency results for a wide variety of kernel-type estimators, meaning that a supremum over suitable ranges of bandwidths is added to the original asymptotic result. This extra supremum permits to handle kernel-type estimators based upon bandwidths that are functions of the data and/or the location. The basic tools of this empirical-process-based approach are appropriate exponential deviation inequalities and moment inequalities for empirical processes.

**D. Facchinetti** (Università Cattolica del Sacro Cuore, Milano)

“Robust methods for the analysis of multivariate data: a comparison between some robust estimators and the Forward Search estimator”

**F. Gach** (University of Vienna)

“Efficiency in indirect inference”

Indirect inference is a simulation-based alternative to maximum likelihood estimation when neither an explicit nor computable form of the likelihood function ist available. The method was introduced in 1993 by Smith and also by Gouriéroux et al.; their proposed estimator turns out to be consistent and asymptotically normal but is efficient only under the somewhat restrictive assumption that the so-called auxiliary model is correctly specified. In this talk, I give an overview of the method and present a new framework leading to efficient estimation.

**S. Van de Geer** (ETH Zürich)

“On complex models and weighted empirical processes”

In these lectures we present a mix of empirical process theory and (penalized) M-estimation. We start out with a motivating example for studying empirical process theory. Consider n observations from a p-dimensional linear regression model. High-dimensional regression is about the case where the number of covariables is much larger than the number of observations: p > > n. In that situation, a certain amount of complexity regularization is needed, because e.g. ordinary least squares will lead to overfitting. We will address the question: how to choose the complexity penalty? Here, we use results for the modulus of continuity of the empirical process.

This approach will allow us to extend the situation to high-dimensional non-linear regression, with p > > n covariables, where for each j the dependence of the response on the jth-covariable is an unknown function satisfying some smoothness constraints.

The empirical process results in this introductory regression problem will be taken for granted at first stage. The other lectures will then consider more details, and extend the situation to general M-estimation problems.

**E. Herrholz** (University Greifswald)

“Parsimonious (true) Histograms”

In the context of one-dimensional density estimation we consider a tube with piecewise constant boundaries around the e.c.d.f. It is already known that the taut string minimizes typical smoothness functionals as well as the number of modes in such tubes. The latter provides an histogram with minimal number of local extremes. A related optimization problem is to obtain an histogram with the smallest number of (unequal length) bins. In this case the taut string does not yield an optimum solution, but it can be used as a first step. We developed an algorithm that provides a representation of one-dimensional data by a piecewise constant function which has both minimal number of local extremes and minimal number of bins. Unfortunately this solution is not a histogram by definition since the area of a bin is not proportional to the frequency of the data lying in the corresponding interval. Hence, we search for alternatives that are true histograms and compare their properties with our solution.

**W. Koolen-Wijkstra** (CWI)

“Combining Expert Advice Efficiently”

We show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard HMM algorithms can then be used to efficiently calculate how the expert predictions should be weighted according to the model. We cast many existing models as HMMs and recover the best known running times in each case. We also describe two new models: the switch distribution, which was recently developed to improve Bayesian/Minimum Description Length model selection, and a new generalisation of the fixed share algorithm based on run-length coding. We give loss bounds for all models and shed new light on the relationships between them.

**Kyung In Kim** (Eindhoven University of Technology)

“Combining clustering and discrete multiple testing in high-dimensional data analysis: an application to array CGH data”

Implementing FDR procedures for detecting signifcant clone regions in array CGH data is quite different from the implementation for cDNA microarray data. Most of all, array CGH data is usually encoded as integer copy numbers which are discrete while cDNA microarray data are encoded as continuous quantities. Moreover, measurement units (often called clones) of array CGH data have different nature: genes consisting of cDNA microarray are biologically functional units themselves but a clone is hardly considered as a functional unit rather certain contiguous clones often regarded as a meaningful unit. In this context, we consider combining clustering and multiple testing procedures for array CGH data. We first performed exploratory data analysis for modeling spatial dependence in the data. Throughout segmentation [1, 4], calling [2] and collapsing [3] algorithms, we assumed that data is discretized by encoding gain, normal and loss. Using the exploratory analysis, we confirmed spatial dependence as strong block-wise dependence so we proposed a likelihood based model for clusters of the discretized data using physical distance information between cluster regions. We also proposed efficient optimization algorithm for the clustering.

**A. Lykou** (Lancaster University)

“Sparse CCA using Lasso”

Collinearity among regressors may make ordinary least squares (OLS) estimates unreliable and difficult to interpret. This arises in electronic data recording where typically the number of regressors is larger than the number of observations. It is well known that shrinkage methods, for instance ridge regression, [?,?], may lead to smaller prediction error. This error may be further be reduced by setting some regression coefficients to zero and just using the resulting subset. More recently, the past decade has seen the development of methods that simultaneously perform shrinkage and selection, including the non negative garotte [?], the lasso [?] and the elastic net [?], all giving sparse predictors. Here we extend the application of the positive lasso to construct a sparse version of canonical correlation analysis (CCA). CCA describes the relationship between two multidimensional variables, by finding linear combinations of the variables with maximal correlation. The canonical variates can be obtained either by solving the eigenvalue equations for the covariance matrix or by using an alternating least squares algorithm. We propose a method of estimating canonical variates by combining the alternating least squares (ALS) algorithm with the positivity constrained lasso or lars-lasso algorithm. Properties of this method of estimating a sparse CCA and illustrative examples are given in the full version of this paper.

(oint work with J. Whittaker (Lancaster University))

**M. Pauly** (Heinrich-Heine-Universitaet)

“About the quality of permutation tests for potentially unbalanced two-sample problems”

It is well known that permutation tests for two-sample problems are not only finite sample distribution free but also of exact level alpha under the null hypothesis of exchangeability. However in more general settings the null hypothesis is heterogeneous. In this case the permutation tests are in general not even asymptotically of level alpha. By following an approach of Neuhaus it is exemplarily shown how this can be resolved by using studentized versions of the test statistic. Moreover the asymptotic behaviour of the corresponding power function is studied.

**N. Meinshausen** (University of Oxford)

“Applications and Algorithms for Sparse Signal Recovery from High-Dimensional Data”

Many scientific disciplines have witnessed an explosion of available data in recent years. Statistics is supposed to help making sense of these data. The challenges are twofold. Algorithms have to be computationally efficient and they have to lead to interpretable results, separating the interesting pieces in the data from the often large amount of uninteresting information. In the context of high-dimensional regression, L1-penalization fits the bill and has thus enjoyed considerable interest in recent years. I will show applications of L1-penalized estimation, ranging from biological applications to compressed sensing in image analysis to graphical modelling and Machine Learning. I will then discuss some theory about the possibility of sparse signal recovery with L1-penalization and discuss proposed extensions that improve upon the L1-penalized estimator in various ways.

**Y. Phinikettos** (Imperial College)

“Adaptive Computation of p-values with an application to Model-checks”

**R. Schlicht **(Helmholtz Center Munich)

“Delay stochastic processes: Theory, simulation, and application to developmental biology”

Molecular processes in cell biology are traditionally modelled by deterministic differential equations. These rely heavily on the assumption of large molecule numbers. If some or all molecules appear in large numbers, stochastic effects become significant and can no longer be ignored.

In addition, molecular processes that generate oscillations often involve delays. As a result, reaction rates can depend on molecule numbers in the entire past.

We here study a non-Markovian stochastic model that provides a precise mathematical description of these phenomena. The explicit construction of the model is well suited for both theoretical analysis and direct simulation. Exact analytical expressions for the reaction rates can be derived.

**A. Schmitz** (University of Cologne)

“Monitoring of changes in linear models with dependent errors”

Horváth et al. (2004) developed a monitoring procedure for detecting a change in the parameters of a linear regression model having independent and identically distributed errors. We extend these results to allow for strongly mixing errors and we also provide a class of consistent variance estimators. Limit properties are derived using invariance principles for partial sums of mixing random variables. This is joint work with Josef G. Steinebach.

**U. Schneider** (University of Vienna)

“On the Distribution of the Adaptive LASSO Estimator”

We study the distribution of the adaptive LASSO estimator (Zou, 2006) for an orthogonal normal linear regression model in finite samples as well as in the large-sample limit. We show that these distributions are typically highly non-normal regardless of the choice of tuning of the estimator. The uniform convergence rate is obtained and shown to be slower than $n^{-1/2}$ in case the estimator is tuned to perform consistent model selection. Moreover, we derive confidence intervals based on the adaptive LASSO and also discuss the questionable statistical relevance of the ‘oracle’-property of this estimator. Simulation results for the non-orthogonal case complement and confirm our theoretical findings for the orthogonal case.

(joint work with B.M. Pötscher (University of Vienna))

**T. Van Erven** (CWI)

“The Catch-Up Phenomenon in Bayesian Model Selection”

Bayesian model averaging, model selection and its approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence in prediction of future data than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. This is called the AIC-BIC dilemma.

I will present the /catch-up phenomenon/, which is a new explanation for the slow convergence of Bayesian methods. Based on this explanation I will define the /switch distribution/, a modification of the Bayesian marginal distribution. Under broad conditions, model selection and prediction based on the switch distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma.

(joint work with Peter Grunwald and Steven de Rooij)

**A. Verhasselt** (Katholieke Universiteit Leuven)

“Nonnegative garrote in additive models using P-splines”

The nonnegative garrote method was proposed as a variable selection method by Breiman (1995). In this talk we consider additive modeling and apply the nonnegative garrote method for selecting among the independent variables. For initial estimation of the unknown univariate functions, we use P-splines estimation (Eilers & Marx (1996)) and backfitting is applied to deal with the additive modeling. We compare the proposed method involving P-splines with some other methods for additive models. The finite-sample performance of the procedure is investigated via a simulation study and an illustration with real data is provided.

(joint work with A. Antoniadis (Laboratoire Jean Kuntzmann, Université Joseph Fourier) and I. Gijbels (Katholieke Universiteit Leuven))