- This event has passed.

# YES III: “Paradigms of Model Choice”

## Oct 5, 2009 - Oct 7, 2009

#### Sponsors

#### Summary

This is the third workshop in the series of YES (Young European Statisticians) workshops. The first was held in October 2007 on Shape Restricted Inference with seminars given by Lutz Dümbgen (Bern) and Jon Wellner (Seattle) together with shorter talks by Laurie Davies (Duisburg-Essen) and Geurt Jongbloed (Delft). The second workshop was held in October 2008 on High Dimensional Statistics with seminars given by Sara van de Geer (Zürich), Nicolai Meinshausen (Oxford) and Gilles Blanchard (Berlin).

The present workshop is directed at young statisticians (mainly Ph.D. students and postdocs) who are interested in the problem of model choice.

Short seminars each consisting of three 45 minute talks on various aspects of model choice will be given.

Model choice has for many years been a point of disagreement and research in statistics. The applications range from the choice between several low dimensional models all of which are reasonable models for the data, the choice of variables to be included in a linear regression, and the choice of smoothing parameter in nonparametric regression, inverse and other ill-posed problems. Over the years several techniques have been developed such as AIC, BIC, MDL (Minimum Description Length) , cross-validation, Lasso (more generally L_1-penalization) and FIC (focused information criterion). Many of these techniques have proved successful for certain types of problem but there is still a need for a discussion of the principles (if any) involved as well as the advantages and disadvantages of these approaches. It is the aim of the workshop to inform the participants of the state-of-the-art in each of these several paradigms and to encourage discussion between the different schools. It is also intended that each of the paradigms of model choice provide examples of their use in real problems to demonstrate their applicability to the analysis of data.

Each of the speakers will concentrate on the problem of model choice from their own perspective.

#### Organizers

**Prof. P. L. Davies**, University of Duisburg–Essen, Germany/Eindhoven University of Technology, Eindhoven/

EURANDOM, Eindhoven.

**Prof. G. Jongbloed,** University of Technology Delft

#### Speakers

**Keynote:**

**Professor Laurie Davies**, Duisburg-Essen

**Approximate models and regularization**

This approach to model choice is based on the idea of approximate models. A model is regarded as an adequate approximation to a data set if `typical’ data generated under the model `looks like’ the real data. The word `typical’ is made precise by specifying a real number α, 0 < α < 1, which determines what percentage of the data sets generated under the model, are to be regarded as typical. The words `look like’ must be operationalized (in practice often in the form of a computer program) so that for any model and any data set it is possible to decide whether the model is an adequate approximation to the data. The precise nature of this will depend on the problem at hand; there is no general principle which can be used. Typically there will be many adequate models and interest will centre on certain simplest ones where simplicity can be defined in terms of shape (e.g. the minimum number of local extreme values) or smoothness (minimum total variation of a derivative) or the absence of `free lunches’ (minimum Fisher information). The ideas and the applications will be illustrated by several examples, amongst others, from the area of nonparametric regression.

References:

Davies, P. L. (1995) Data features. Statistica Neerlandica, (49), 185-245.

Davies, P. L. (2008) Approximating data (with discussion. Journal of the Korean Statistical Society, (37) 191-240.

Tukey, J. W. (1993) Issues relevant to an honest account of data-based inference, partially in the light of Laurie Davies’s paper. Princeton University, Princeton,

http://www.stat-math.uni-essen.de/tukey/tukey.php

**Professor Peter Grünwald**, Amsterdam

**The Minimum Description Length Principle**

We give a self-contained introduction to the Minimum Description Length (MDL) Principle., introduced by J. Rissanen in 1978. MDL is a theory of inductive inference, based on the idea that the more one is able to compress a given set of data, the more one can be said to have learned about the data. This idea can be applied to general statistical problems, and in particular to problems of model choice. In its simplest form, for a given class of probability models M and sample D, it tells us to pick the model H \in M that minimizes the sum of the number of bits needed to describe first the model H and then data D where D is encoded `with the help of H’. This is a special case of the general formulation of MDL, which is based on the information-theoretic concept of a `universal model’, which embody an automatic trade-off between goodness-of-fit and complexity.

In these lectures, we focus on three aspects:

* Frequentist Considerations – Consistency and Minimax Convergence Rates: MDL model choice and prediction is statistically consistent under a wide variety of conditions. We review A. Barron’s surprisingly simple proofs of these results, which provide a direct link between data compression and statistical convergence rates: each estimator can be interpreted as a code, and the better this code compresses the data in expectation, the faster the estimator’s risk converges.

* Bayesian Considerations – since prior distributions may be interpreted as codes, practical MDL implementations are often quite similar to Bayes factor model selection and model averaging, but there are important differences. For example, the Bayes predictive distribution reappears in MDL, but the Bayes posterior does not. Also, MDL avoids the Bayesian inconsistency results of Diaconis and Freedman, since these are based on priors that provably do not lead to data compression.

* AIC/BIC-dilemma: standard MDL does not achieve the optimal minimax convergence rates in some nonparametric settings. We explain this phenomenon and describe the switch distribution as a potential remedy.

References:

A. Barron, J. Rissanen and B. Yu. The Minimum Description Length Principle in Coding and Modeling. IEEE Transactions on Information Theory 44(6), 2743-2760, 1998.

P. Grunwald. A Tutorial Introduction to the MDL Principle. Chapters 1 and 2 of ‘Advances in MDL: Theory and Practice’, MIT Press, 2005.

P. Grunwald. The Minimum Description Length Principle. MIT Press, 2007.

T. van Erven, P. Grunwald and S. de Rooij. Catching up Faster by Switching Sooner: a prequential solution to the AIC-BIC dilemma. preprint, arXiv:0807.1005, 2008, November 2008.

**Professor Nils Hjort,** Oslo

**Focused Information Criterion**

The FIC was developed by Gerda Claeskens and Nils Hjort in two articles in the Journal of the American Statistical Association in 2003. They have since become two of the most cited articles on the problem of model choice. Whereas criterion such as AIC or BIC choose a model without reference to its intended use, the FIC criterion explicitly demands that the use to which the model is to be put be made precise. If for example a quantile is of interest one may choose a different model than that if the mean were the quantity of interest. If both are of interest for the same data set, then one could choose one model for the quantile and a different one for the mean. Claeskens and Hjort have made this precise in an asymptotic setting and shown how their approach can be validated. They are also able to prove the advantage of model averaging if the results for different models are close together. A further result which comes from their analysis is the calculation of confidence intervals. Many statisticians choose a model on the basis of some criterion and then, having chosen it, calculate confidence intervals neglecting the process by which the model was chosen. This is known to lead to over optimistic confidence intervals. Claeskens and Hjort have shown how this problem can be overcome within their paradigm so that the confidence intervals have at least asymptotically the correct coverage probability.

References:

Hjort, N.L. and Claeskens, G. (2003) Frequentist model average estimates. Journal of the American Staistical Association, (98) 879–899.

Hjort, N.L. and Claeskens, G. (2003) Frequentist model average estimates. Journal of the American Staistical Association, (98) 900–916.

**Professor Christian Robert**, Paris

**Computational approaches to Bayesian model choice**

The seminar will cover recent developments in the computation of marginal distributions for the comparison of statistical models in a Bayesian framework. Although the introduction of reversible jump MCMC by Green in 1995 is rightly perceived as the `second MCMC revolution’, its implementation is often too complex for the problems at hand. When the number of models under consideration is of a reasonable magnitude there exist computational alternatives such as bridge sampling, nested sampling and ABC (Approximate Bayes Computation) which avoid model exploration with reasonable efficiency. The seminar will be devoted to discussing the advantages and disadvantages of these alternatives.

http://fr.arxiv.org/abs/0807.2767

http://www.arxiv.org/abs/0801.3887v2