Statistical Information and Modelling (SIM)

1. Introduction

2. People and activities

3. Description of the research themes

3.1 Statistical Signal and Image Analysis
3.2 Statistics in Biology
3.3
Statistics in Industry

4. International cooperation

5. Former people and past activities

1. Introduction

Mathematical Statistics is an indispensable tool in all fields of modern science. At EURANDOM we focus on themes from various areas presently undergoing vigorous development, and supplying major challenges to statistics: data, signal and image analysis, life sciences, computational learning, industry, quantum information. Each area presents its own unique types of problem, but the same fundamental ideas from theoretical statistics can be applied in all, giving insight and creating underlying links. The availability of huge amounts of data, having a complex stochastic structure depending on very many unknown parameters, calls for statistical modelling and analysis techniques having a different flavour from classical methodology. Despite modern computational power, the problems require a closer than ever intertwining of algorithms and theory: scientific ambition and the size and complexity of data grow faster than our ability to mechanically process those same data. Statistical optimality and computational feasibility cannot both be achieved at the same time; compromises need to be taken and the guiding principles of classical statistical theory do not necessarily lead to useful solutions. Still, we need to capitalize more than ever on what we have learnt from classical statistical theory, and in particular from asymptotic (large sample) optimality theory.

Underlying and unifying mathematical statistical themes in these areas are:

- high-dimensional statistical modelling
- Bayesian methodology (studied from frequentist perspectives)
- empirical process theory
- asymptotic optimality
- missing data problems and hidden Markov models
- experimental design
- algebraic and geometric methods
- statistical information
- networks.

By concentrating on problems which are in a wider sense linked to nonparametric methods in statistics the SIM-group aims at developing a high degree of cohesion allowing scientific contacts on an everyday basis between all members of the group. On the other hand topics like signal extraction (time series), image analysis and statistical learning are sufficiently broad to make it possible to include projects from different disciplines.

Our current research themes are:

Statistical Signal and Image Analysis
Statistics in Biology
Statistics in Industry (algebraic methods, reliability)

Related to industrial statistics, but managed separately from SIM, is the Integrated Batteries project.

In addition there is activity in the theoretical aspects of statistical learning,  that plays a role in each  of the above themes, and in quantum information (optimal quantum measurement).We close this section with a discussion on these activities; the three themes are described in Section 3.

Statistical learning is an inspiring interdisciplinary field, with large components in both computer science and mathematical statistics. Reliance on a properly specified stochastic model is much less strong than in classical statistics; rather, one focuses on a well-defined task (e.g., a prediction task) and on a specific loss function, and one develops procedures with approximately optimal long-run behaviour independently of the correct model.
More specialised, also statistics in quantum information influences important questions in statistics. In quantum information, quantum systems (for instance, single photons) are regarded as carriers of information and are used for communication and data processing tasks. This always involves measurement of quantum systems, and using the measurement outcomes to make inference about the state of the system. Typically one can choose between different, incompatible measurements. "Quantum Statistics" is as much concerned with the optimal choice of measurement as with the optimal processing of the data obtained from a given measurement. Within SIM research focusses on asymptotic theory. In that case, the optimal data processing, given the design of the experiment, can to a large extent be left to classical statistical theory, though often the models are new to statistics and can present new challenges. For instance, quantum tomography can be presented as a variant of classical statistical tomography problems from medical imaging, but prior knowledge about the image to be reconstructed is of a completely different nature in the two domains.
If the sample is going to be large but the experiment is yet to be designed, then we can focus on designing it to optimise the statistical information in the data, as measured by Fisher information. Still, quantum complementarity means that one can only obtain much information about one parameter at the expense of only learning little about another. Moreover, the optimal experiment will depend on the value of the parameter, which is unknown, and hence one must look at adaptive procedures. Experiments in quantum physics are often done simply to prove the incompatibility of what we see  in the laboratory, with a classical (pre-quantum) description of reality. Here again one can pose the question what is the best experiment to do, in order to obtain the expected evidence against classical physics, as speedily as possible.

The Statistical Information and Modelling programme runs in close collaboration with mathematical statisticians of the stochastics groups at the Vrije Universiteit of Amsterdam, the Universities of Amsterdam, Utrecht, and the Eindhoven University of Technology.

Throughout the year, EURANDOM has openings for postdoc, PhD, sabbatical and visiting positions. Those interested in such positions are kindly asked to send their application to:

Professor O.J. Boxma - Scientific director
EURANDOM
P.O. Box 513
5600 MB Eindhoven,
The Netherlands

EURANDOM
P.O. Box 513
5600 MB Eindhoven
The Netherlands

2. People and activities

List of present Postdocs, PhD students and Research Fellows

 Name Postdoc/PhD student/ Research Fellow Period Dmitry Danilov Postdoc 05/2006 - 05/2008 Ambedkar Dukkipatti Postdoc 04/2007 - 04/2008 Peter Grünwald Research Fellow 01/2005 - 01/2006 Efang Kong Postdoc 02/2007 - 02/2009 Alexander Ledovskikh Postdoc 09/2007 - 04/2009 Guangming Pan Postdoc 06/2007 - 06/2009
 Shota Gugushvili Postdoc 01/2008 - 01/2010

Steering Committee
An international steering committee oversees the SIM-programme:

- P. Donnelly - University of Oxford, United Kingdom
- P. Green - University of Bristol, United Kingdom
- U. Gather - University of Dortmund, Germany
- M. Newby - City University, United Kingdom
- S. Tavaré - University of South Carolina, United States of America
- A. Tsybakov - Université Paris VI, France

Various activities

- Informal meeting Eindhoven statisticians - biweekly. Organisers: S. Kuhnt and S. Di Bucchianico

Reports 2006

2006-001
Minimax and adaptive estimation of the Wigner function in quantum homodyne tomography with noisy data
L. Artiles, C. Butucea, M. Guta

2006-002
Penalized empirical risk minimalization

2006-013
Estimation of the reaction efficiency in Polymerase Chain Reaction
N. Lalam

2006-018
Pseudo maximum likelihood estimation for differential equations
N. Lalam, C. Klaassen

2006-023
Factorial Designs and Harmonic Analysis on Finite Abelian Groups
P. van de Ven, A. Di Bucchianico

2006-030
Title t.b.a.
F. Rigat

2006-034
Bullet-proof math

2006-035
On nonnegative garrote estimator in a linear regression model

3. Description of the research themes

• P.L. Davies (TU/e and University of Duisburg - Essen, Germany)
• M.N.M. van Lieshout (CWI, Amsterdam)

Research is concentrated on high dimensional or infinite dimensional parameter spaces as they occur in nonparametric regression and signal extraction for non-linear systems. The emphasis is on signal extraction for time series and the analysis of two and three dimensional images but related topics from other areas which also involve high dimensional spaces will be included. Work in these areas combine theoretical considerations which clarify the performance of the procedures by subjecting them to a mathematical analysis as well as the development and implementation of algorithms so that they can be applied to data sets found in practice. An important and as yet little developed area is the asymptotic analysis of algorithms as an increasing number of procedures are defined in terms of an algorithm rather than as the solution of some extremal problem.

The research group works on low-level de-noising, intermediate level segmentation algorithms and benchmarking, as well as high-level image and video interpretation problems. The methodologies used vary widely from splines to linear and quadratic programming problems, and automatic smoothing using diffusion equations. We use tools and concepts from stochastic geometry, such as marked point and object processes, and Markov chain Monte Carlo theory and methods. The problems in image analysis are many and varied from identifying peaks and edges to linear and non-linear inverse problems. Signal extraction concentrates on times series both in one and several dimensions with applications in fields from financial data to the on-line monitoring of intensive care patients. Here again the development and implementation of algorithms will be of great importance with in some cases the emphasis being on speed for on-line application rather than the impractical calculation of some optimal statistic. Methods from robust statistics have a role to play in both areas.

The group has good contacts to other colleagues working on related problems: A.J. Baddeley (University of Western Australia, Australia), X. Descombes (INRIA Sophia Antipolis, France), L. Dümbgen (University of Berne, Switzerland), U. Gather (University of Dortmund, Germany), P. Gregori (University Jaume 1, Spain), O. Häggström (Chalmers University of Technology, Sweden), U. Hahn (University of Augsburg, Germany), R. Huele (Leiden), E.B.V. Jensen (University of Aarhus, Denmark), W.S. Kendall (University of Warwick, UK),  R. Kluszczyński (Nicolaus Copernicus University, Poland) A. Kovac (University of Bristol, U.K.), V. Liebscher (University of Greifswald, Germany), J. Mateu (University Jaume I, Spain), J. Möller (University of Aalborg, Denmark), I.S. Molchanov (University of Berne, Switzerland), E.J. Pebesma (Utrecht), T. Schreiber (Nicolaus Copernicus University, Poland), V. Spokoiny (Weierstrass Institute, Berlin, Germany), A. Stein (Wageningen), R.S. Stoica (INRA Avignon, France), E. Thönnes (University of Warwick, UK), G. Winkler (GSF, Munich, Germany), J. Zerubia (INRIA Sophia Antipolis, France),  S.A. Zuyev (University of Strathclyde, UK), and E.W. van Zwet (Leiden).

• M.C.M. de Gunst (Vrije Universiteit, Amsterdam)
• C.A.J. Klaassen (University of Amsterdam)

Molecular biology, genetics, cell biology, and systems biology generate enormous challenges for statisticians in the 21st century, as scientists try better and better to understand the pathways from DNA to living organism. At EURANDOM we work closely with biologists in a number of concrete research projects. The emphasis lies on stochastic modelling and on the interplay of theory and application. At present the main topics are:

• Statistical problems in genomic mapping more specifically, semiparametric copula models in twin research, with D.I. Boomsma, Biological Psychology, VUA, power of scan statistics for genetic linkage detection; and modelling and statistics for networks in biology, with D.O. Siegmund, Statistics, Stanford (USA);

• Modelling and statistical analysis of developmental gene networks with J. Kaandorp, Computer Science, Universiteit van Amsterdam (UvA), and J. Reinitz, Applied Mathematics and Statistics/Developmental Genetics, Stony Brook University, USA;

• Spatio-temporal modelling and analysis of neuronal activity patterns with A.B. Brussaard, A. van Ooyen, Experimental Neurophysiology, VUA, and J. van Pelt, Neurons and Networks, Netherlands Institute for Brain Research.

• A. di Bucchianico (TU/e)

EURANDOM aims to be an active player in statistics in industry. It played a role in the formation of ENBIS, the European Network for Business and Industrial Statistics, and is a partner on the EU Thematic Network pro-ENBIS, which grew out of ENBIS. It has contributed to all the ENBIS annual conferences and Talía Figarella (PhD student) won the prize for the best presented paper at the 2003 Conference in Barcelona. Henry Wynn has been founding president of ENBIS and with colleagues at EURANDOM is active in the pro-ENBIS work packages. Other international activities include the hosting of Europe's major international workshops on algebraic statistics (GROSTAT 3, September 1999), respectively optimal design of experiments (mODa, June 2004). Two main scientific themes are: Algebraic Statistics and Reliability. Related to industrial statistics, but managed separately from SIM, is the Integrated Batteries project (I-BAT).

• Algebraic Statistics

Following success with the application of Gröbner bases to the design of experiments (Wynn and Pistone, Biometrika 1986) the subject of algebraic statistics has been adopted as a main area. The research concentrates on three specific topics:

1. Design of Experiments (DOE)
Industrial practice requires experimental designs for studying the impact of factors on both mean and variance of response factors. Several approaches exist in literature, but a general framework is not yet known. SIM is developing a new algebraic approach including algorithms.

2. Contingency tables
Hypothesis testing on contingency tables using asymptotics does not work well in many practical cases. Following the seminal work by Diaconis and Sturmfels, there is a growing interest in using algebraic and geometric methods for MCMC methods, ML estimation and identifiability problems in this area. Within SIM emphasis is on the latter two topics

3. Statistical learning
We are exploring possibilities to apply algebraic methods to statistical learning, in particular, the design of kernels for soft sensors.

• Reliability

Research on reliability concentrates on two subthemes: Signature Analysis and Software Reliability

1. Signature Analysis
This important area is a critical part of the wider area of "end of life" analysis, which seeks to analyse used products for reuse. This is driven by the new EU WEEE directives to avoid environmentally costly landfill. The art is to be able to quickly detect performance deterioration and predict product life beyond first use, using a blend of designed experiments and signal processing. The Quality and Reliability Engineering group of Aarnout Brombacher with the Department of Technology Management at TU/e is also actively involved in this project.

2. Software Reliability
The Statistical Testing and Reliability Estimation of Software Systems (STRESS project).
As a (hardware) extension of the current activities in reliability A. Di Bucchianico has initiated a collaboration on software reliability with the newly founded LaQuSo (Laboratory for Quality Software) of the Department of Mathematics  and Computer Science at TU/e. The goal of this collaboration is to develop testing strategies of software that on the one hand incorporate structural knowledge of software using modern computer science models, and on the other hand apply modern statistical methods. A multidisciplinary project proposal was awarded a grant by NWO for 2 PhD students (1 in computer science at LaQuSo, 1 in statistics at EURANDOM). The PhD students started in June 2005. Initial results by A. Di Bucchianico and K. van Hee and J.F. Groote, on release strategies that guarantee error freeness with a certain confidence, have been presented at several software testing conferences.

Members of SIM participate in the EU network programmes pro-ENBIS and ENBIS (European Network for Business and Industrial Statistics), PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning, RESQ (Resources for Quantum Information) and MUSCLE. In addition to these formal cooperations, there are numerous individual-level cooperations, with scientists throughout Europe and USA.

SIM is a founding member of "QRandom'', an informal international consortium that has organised a sequence of workshops in Aarhus (1999), EURANDOM (2001), Dresden (2002) and Aarhus (2003), on stochastics in quantum information and quantum measurement.

Future workshop(s)

• September 24-26, 2007
Algoritms in Complex Systems, workshop within the framework of the Network of Excellence (NoE) PASCAL

• October 8-12, 2007
Workshop "YES-I" (Young European Statisticians) 2007
Large Shape Restricted Inference

Former Postdocs, Ph.D. students and Research Fellows

Statistical Information and Modelling (previously CSM, AS, CMB)

 Name Postdoc / PhD student / Research Fellow Period 1. Nicola Armstrong Postdoc 02/2002 - 04/2004 2. Luis Artiles Martinez Postdoc 01/2002 - 12/2004 3. Isaac Corro Ramos Postdoc 07/2005 - 07/2009 4. Bojan Basrak Postdoc 07/2000 - 07/2003 5. Wicher Bergsma Postdoc 09/2003 - 09/2005 6. Julia Brettschneider Postdoc 01/2001 - 08/2001 7. Nicolas Brunel Postdoc 07/2006 - 12/2006 8. Cheikh Diack Postdoc 11/1998 - 11/2000 9. Sandro Di Bucchianico Senior Researcher 09/1998 - 09/2001 10. Peter Grünwald Postdoc 11/1999 - 11/2001 11. Research Fellow 01/2006 - 01/2007 12. Madalin Guta Postdoc 01/2002 - 01/2004 13. Research Fellow 01/2004 - 01/2005 14. Farida Enikeeva Postdoc 05/2003 - 05/2005 15. Talía Figarella PhD student 06/2003 - 09/2006 16. Sonia Hernandez-Alonso Postdoc 09/2000 - 09/2003 17. Roxana Ion Postdoc 06/2001 - 12/2002 18. Alexey Koloydenko Postdoc 10/2002 - 10/2005 19. Vladimir Kulikov Postdoc 05/2003 - 02/2005 20. Nadia Lalam Postdoc 02/2004 - 11/2006 21. Jüri Lember Postdoc 02/2001 - 08/2003 22. Andries Lenstra Postdoc 01/2001 - 01/2003 23. Patrick Lindsey Postdoc 09/2001 - 09/2003 24. Research Fellow 01/2005 - 01/2006 25. Leila Mohammadi Postdoc 11/2004 - 11/2006 26. Nino Mushkudiani Postdoc 09/2001 - 09/2003 27. Eva Riccomagno Postdoc 07/1999 -  03/2001 28. Fabio Rigat Postdoc 09/2004 - 09/2006 29. Peter van de Ven PhD student 02/2003 - 03/2007 30. Brandon Whitcher Postdoc 09/1998 - 09/2000 31. Jian Zhang Postdoc 06/1999 - 09/2002

Visitors

 Name Affiliation Country Period W. Khamaladze Victoria University New Zealand January 29 - February 2, 2006 J. Kahn ENS France February 6 - 15, 2006 S. Kuhnt Dortmund University Germany February 21 - 24, 2006 F. D'Alché-Buc IBISC France August 30 - 31, 2006 November 20, 2006 M. Viana University of Illinois at Chicago USA December 6 - 14, 2006 S. Kuhnt Dortmund University Germany March 2, 2005 M. Viana University of Illinois at Chicago USA March 10-24, 2005 M. Lupparelli University of Florence Italy April 4 - 18, 2005 M. Huskova Charles University Prague Czech Republic June 9 - 21, 2005 S. Vidal Puig Technical University of Valencia Spain October 20- December 21, 2005 E. Khamaladze Victoria University New Zealand December 6 - 12, 2005 T. de Bie Katholieke Universiteit Leuven Belgium December 19 - 20, 2005

Workshops

 October 8-12, 2007 " September 24-26, 2007 NEST II Mathematical Methodologies for Operational Risk April 16-18, 2007 Image Analysis and Inverse Problems December 11-13, 2006 Statistics for biological networks January 16-18, 2006 PASCAL - Modelling in Classification and Statistical Learning October 3-5, 2005 Regional Meeting on Design of Experiments (DOE) and Statistical Process Control (SPC) June 24, 2005 Mini-course Symmetry Studies - in cooperation with EIDMA - March 14-18, 2005

Reports

For information about reports of previous years and/or downloads of abstracts and reports have a look at the EURANDOM reports page.

Last update: September 3 2008
MB