|
Cluster Plan
STAR -- Stochastics - Theoretical
and Applied Research
MATHEMATICS CLUSTER STOCHASTICS
1. Motivation
What is stochastics?
The stochastics cluster STAR
What research topics is the stochastics cluster targeting at?
2. Participating staff
3.
Structure
3.1 Goals
3.2 Organisation
3.3 Collaboration
3.4. Budget
4.
Education
5. Science, Industry and
society
6.
Research projects
6.1 General Methodology
6.2 Mathematical
statistical Physics
6.3 Stochastics and the
life sciences
6.4 Stochastic Networks
6.5
Stochastic finance
and econometrics
∆
MATHEMATICS CLUSTER STOCHASTICS
Stochastics Theoretical and Applied Research
1. Motivation
What is stochastics?
Randomness is key to phenomena as diverse as phase transitions in
polymer chains, resource demands in computer networks, or data variation
in micro-array experiments. Stochastics is the science of randomness. It
is a branch of mathematics that builds general tools and theories that
enable to understand, predict, and often control, the numerous phenomena
that are subject to chance.
Stochastics is a multidisciplinary science that takes its motivation
from a wide range of scientific fields and industrial applications.
Examples are: aging in disordered materials, fluctuations of interest
rates, congestion in telephone lines or road traffic, effect of air
pollution on health, instability in logistic processes, climate change,
or genetic determinants of diseases. Mathematical abstraction allows to
extract a common denominator for the chance mechanisms inherent in such
phenomena. The building of a unified mathematical body of concepts and
theories relating to randomness has turned out to be extremely useful:
from constructing, understanding and analysing to fitting and
optimising.
In the rapid technological advancement of the past few decades
complexity features as a keyword alongside randomness. Technology
enables us to study in ever finer detail the various processes occurring
in nature, and to build sophisticated instruments to monitor and
influence these processes. In coping with complexity, mathematics –
alongside the natural and life sciences – plays a crucial role. As part
of mathematics, stochastics is particularly well equipped to model,
analyse and optimize complex systems, be it because such systems are
intrinsically random or because a probabilistic description allows to
capture the essential features of such systems.
Stochastics encompasses the areas of probability theory, statistics and
stochastic operations research. Probability theory builds up the
mathematical framework to describe and interpret complex and random
systems, statistics provides the methods and tools to properly handle
and interpret the data drawn from these systems, while stochastic
operations research offers ways to optimize and control their
performance. These capabilities make stochastics an essential enabling
technology. Conversely, the dynamic developments in the various areas
present a challenging research agenda for stochastics. Very often, it is
the fundamental work that leads to the deepest insight and the broadest
range of applicability.
The stochastics cluster STAR
∆
Stochastics is important both as an
enabling technology and as a scientific discipline. Stochastics is
presently flourishing internationally. The Dutch stochastics community
is strong and thriving, as is evidenced by its high reputation,
visibility and level of activity, both nationally and internationally.
Over the years, it has built up a broad spectrum of active working
relations with researchers from physics, biology, medicine, economics
and industry. In addition, it has begun to coordinate its MSc and PhD
educational programs at the regional and national levels. It aims to
contribute to society, while engaging in stochastics as an important and
exciting field of scientific research.
The establishment, in 1998, of the internationally oriented research
institute Eurandom has given a boost to Dutch stochastics. Its workshop
and visitor programs have drawn the best researchers in the field
worldwide to The Netherlands, and its postdoc program has attracted a
large number of highly talented postdocs from abroad – quite a few of
whom have subsequently accepted a tenured position at a Dutch university
or company.
In a number of areas Dutch stochastics is of internationally recognized
excellence. However, Dutch stochastics is relatively small and its
reputation hinges upon a small number of senior researchers. To keep the
momentum, it is essential to offer a stimulating and attractive research
environment to a new generation of talented researchers. In a few fields
of vital importance Dutch stochastics is under-represented and a strong
impetus is needed. Examples are biostatistics and stochastic finance.
These are fields that, internationally, are going through a period of
feverish activity and generate a large demand for well-trained
probabilists and statisticians who can contribute to the ensuing
application areas.
The stochastics cluster STAR, with
the European research institute Eurandom in a coordinating role,
pushes Dutch stochastics further to the international forefront and
allows the Dutch stochastics community to attract and train the most
talented students and young researchers worldwide. It gives a much
needed impulse to under-represented fields and strengthens the existing
leading position in well-represented fields. The cluster leads to
more critical mass in stochastics, and further stimulates the
interaction of researchers from probability theory, statistics and
stochastic operations research with researchers from other disciplines.
The cluster plan is ambitious, but it should be realized that similarly
ambitious plans are being developed in places like Berlin, Paris,
Zürich, Berkeley and Vancouver. For example, the University of British
Columbia in Vancouver has 5 full professors in probability and is
targeting to have 10. It aims to establish “the world’s leading center
for research and graduate training in stochastic science”, bundling
forces with the Universities of Victoria and Washington, the Microsoft
Theory Group in Seattle, and the Pacific Institute for the Mathematical
Sciences. This is spurred by the belief that “we see the dawning of the
age of stochasticity in every aspect of basic science and its
applications, affecting virtually all of science in this century.” The
best way in which The Netherlands can continue to collaborate on an
equal footing with the leading centers in the world, and be able to keep
its most talented probabilists and statisticians, is to join forces in a
stochastics cluster.
What research topics is the stochastics cluster
targeting at?
∆
There is great potential for a comprehensive activity in a few well
chosen areas of stochastics. The cluster aims for a coordinated research
effort in the following five topics:
(1) General methodology (2)
Mathematical statistical physics (3) Stochastics and the life
sciences (4) Stochastic networks (5) Stochastic finance and
econometrics
The remainder of this text is
organized as follows. In Section 2 we list the participating staff. In
Section 3 we describe the structure of the cluster, specify the main
goals, and sketch how we intend to reach them. Educational aspects are
discussed in Section 4. The contribution to industry and society at
large is the subject of Section 5. In Section 6 we add a list of
research projects that will be addressed by the cluster. These projects
are grouped under the 5 topics mentioned above and represent the best
the cluster has to offer in terms of importance, viability and strength.
2. Participating staff
∆
The cluster has Eurandom as its central, coordinating node. Apart from
that, we have not put emphasis on institutes as research nodes, but
rather on leading senior researchers in stochastics in The Netherlands.
Below is a (non-exhaustive) list of key researchers, with their first
affiliation and, where present, a second affiliation (several of them
are advisor at Eurandom or CWI, or part-time professor at another
university), who are already in some way involved in the cluster
activities.
• I.J.B.F. Adan (TU/e + Eurandom + UvA) • J. van den Berg (CWI + VU) • S.C. Borst (TU/e +
Eurandom) • R.J.
Boucherie (UT + Eurandom) • O.J. Boxma (TU/e + Eurandom) • F. Camia (VU) • F.M. Dekking (TUD) • J. Einmahl (UvT) • A.
van Enter (RUG) • A.J. van Es • R. Fernandez (UU) • R.D. Gill (UL) • A. Gnedin (UU) • M.C.M. de Gunst (VU +
Eurandom) • R. van der Hofstad (TU/e + Eurandom) • W.Th.F. den
Hollander (UL + Eurandom)
• G. Hooghiemstra (TUD) • G. Jongbloed (TUD + Eurandom) • C.A.J.
Klaassen (UvA + Eurandom) • G.M. Koole (VU) • C. Külske (RUG)
• R. Laeven (UvA + Eurandom) • J.S.H. van Leeuwaarden (TU/e +
Eurandom) • M.C. van Lieshout (CWI + Eurandom + TU/e) • M.R.H. Mandjes (UvA +
Eurandom) • R.W.J. Meester (VU) • R.D. van der Mei (CWI + VU) •
J. van Neerven (TUD) • R. Núñez-Queija (CWI + UvA) • F. Redig (RUN
+ CWI) • M. Schröder (VU) • V. Sidoravicius (CWI + UL + Eurandom) • P.J.C. Spreij (UvA) •
A.W. van der Vaart (VU)
•
M. Vlasiou (TU/e + Eurandom) • M.A. van de Wiel (VU) • J.H. van Zanten (TU/e)
• A.P. Zwart (CWI + VU + Eurandom)
Researchers appointed on STAR grants
2010/2011:
• A.C. Fey
• W.. Ruszell
• B.T. Szabo
• J.P. Dorsman
• M.R. Schauer
• M. Heydenreich
• B. Ros
Although this list is restricted to
researchers who presently work in The Netherlands, we’d like to
emphasize that the cluster is going to be highly internationally
oriented. All the above-mentioned researchers have active collaborations
with researchers abroad, and many of them participate in international
networks. For example, the European research institute Eurandom
presently participates in a European Network of Excellence, in a project
of the European Investment bank, in an FP7 project, in a Marie-Curie
project, and in a large bilateral Dutch-German research program funded
by NWO and DFG, and most of these projects are executed with postdocs
who come from all over the world; and similar remarks can be made for
the other involved research groups.
∆
3. Structure
3.1 Goals
The goals of the cluster are:
- To further strengthen the quality
of Dutch stochastics research, including underrepresented areas like
biostatistics and stochastic finance;
to enhance its coherence;
and to increase its visibility (see the remainder of Section 3).
- To have a strong impact on the
education of stochastics at the level of MSc and PhD
students
(see Section 4).
- To make a major contribution to
the analysis and optimisation of complex and random systems, arising
in science, industry and society at large (see Section 5).
- To do top-level research in a
number of key areas of stochastics (see Section 6).
3.2
Organisation
Eurandom acts as coordinating and
facilitating node of the cluster. Eurandom is a research institute in
the area of stochastics and its applications, located in Eindhoven. It
has no tenured research staff, being predominantly a postdoc institute,
with 20 postdocs (coming from all over the world) and 5-7 PhD students
in temporary appointments. Eurandom has been operational since 1998, and
has rapidly built up a very strong reputation as an institute with “an
extremely stimulating research environment, a dynamic and high-quality
research program, a strong visitor program and a very extensive lecture
and workshop program”, according to a review by an international panel
in 2005. The stochastics cluster will enable Eurandom to maintain and
strengthen its role as a stochastics facility that organizes scientific
meetings, attracts leading researchers to The Netherlands, and
facilitates teaching at the MSc and PhD level.
As mentioned above, all the leading Dutch senior researchers in
stochastics participate in the cluster. Some of the institutes with
which they are affiliated will appoint a young researcher at the
assistant professorship level, guaranteeing continuation of the funding
of these positions when the term of the cluster has ended. There are
several extremely talented young Dutch researchers in stochastics
presently working abroad. The cluster will offer an excellent
opportunity to bring some of them back to The Netherlands.
Eurandom and CWI will use part of the cluster money for hiring postdocs,
as an effective and flexible way of providing a stimulus to Dutch
stochastics. Experience has taught that:
(i) postdoc positions attract talented researchers from abroad, who
often stay in The Netherlands;
(ii) a postdoc period, without substantial teaching obligations, is a
crucial step forward in the career of a promising young researcher.
The (financial) administration of the cluster will be placed at
Eurandom. The overall direction and research quality of the cluster will
be supervised by a Scientific Committee, which will initially consist of
the following persons:
O.J. Boxma (TU/e + Eurandom) W.Th.F. den Hollander (UL +
Eurandom) R.D. van der Mei
(VU + CWI) C.A.J. Klaassen (UvA + Eurandom) A.W. van der Vaart (VU
+ Eurandom)
Each of the five research topics,
described in Sections 6.1– 6.5, will have a coordinator or coordinating
team, initially:
(1) R.D. Gill (UL) (2) R.W. van der
Hofstad (TU/e + Scientific Director Eurandom) (3) M.C.M. de Gunst (VU +
Eurandom) (4)
M.R.H. Mandjes (UvA + Eurandom) (5) P.J.C. Spreij (UvA), M. Schröder
(VU)
The Educational Committee is
responsible for the coordination of the educational activities in the
MSc and PhD programs. It coordinates the MSc program activities with the
directors of education at the participating institutes, the national
“regie-orgaan” of Master Math, regional cooperations such as Stochastics
and Financial Mathematics (S&FM), and the Dutch graduate network in
Operations Research (LNMB). For the courses at the PhD level, it
coordinates with LNMB and with the aio-network in stochastics. The
initial committee presently consists of:
I.J.B.F. Adan (TU/e + Eurandom) R.J.
Boucherie (UT + Eurandom) F. Redig (RUN + CWI) J.H. van Zanten (TU/e
+ Eurandom)
3.3 Collaboration
For each of the five topics,
there is a more or less fixed day at which the participants meet and run a joint
seminar. These meetings mainly take place at Eurandom and in
Amsterdam. Office space is be made available at the institutes hosting
the seminars to accommodate the weekly visitors. For each of the five
topics, there will further be at least one workshop per year, including “user days” with participants from industry, science and society at
large. There will also be national events, including the annual Lunteren
conferences that already have a strong tradition in The Netherlands.
3.4 Budget
For the two-year period 2010-2011 STAR
has received 750K Euro per year from NWO, which was allocated to 11
projects in 2010. These projects are shown in tabel 1; the financial
breakdown for 2010-2011:
| STAR budget in
2010-2011, in k€ per year |
| Tenure track |
4 x 67.5 |
270 |
| Postdoc |
2 x 60 |
120 |
| PhD |
5 x 50 |
250 |
| Advisor |
|
10 |
| Workshops,
visitors |
|
60 |
| Outreach,
administration |
|
40 |
| Total |
|
750 |
The plans encompass investment in permanent positions, and funding
for workshops, visitors, seminars, exchanges, and administration. The
total costs are 1480 K€ per year. The financial breakdown is shown below.
| Requested STAR budget from 2012, in k€ per year |
| Tenure track |
8 x 70 |
560 |
| Full professor finance |
100 |
100 |
| Tenure track finance |
2 x 70 |
140 |
| Postdoc |
6 x 60 |
360 |
| Workshops |
|
150 |
| Special months and exchanges |
|
60 |
| Visitors |
|
50 |
| Seminars |
|
30 |
| Outreach, administration |
|
30 |
| Total |
|
1480 |
∆
4. Education
The cluster will take a leading role in
the teaching of stochastics at the level of MSc and PhD students. A
coherent program of courses will be offered, of course where possible
making use of successful existing programs. In particular, the cluster
will become a partner in the following existing activities
- The Dutch graduate network of
Operations Research (LNMB) offers an extensive and coherent program
in the area of stochastic operations research at the MSc and PhD
level: 7 MSc courses (with exams) are being offered by LNMB each
year, while 18 PhD courses (with extensive homework exercises and
assignments) are presented in a biennial cycle. (See
http://www.math.leidenuniv.nl/ ~lnmb/) These programs have given
a strong impetus to the field of Operations Research, providing
better training to students and strengthening interactions between
staff and students, as well as among students.
- The master program Stochastics and
Financial Mathematics (S&FM), currently run by the VU, UvA, UU and
UL, offers a broad master program in stochastics and its
applications, including finance and the life sciences. (See
http://www.math.vu.nl/sto/onderwijs/sfm/) The program
coordinates the master courses of the participating universities
(some 15-20 courses every year), which include both basic master
courses and more specialized courses, that can be viewed as being
partially at the PhD level.
- The MasterMath is the national
cooperation in mathematics education and contains several stochastic
modules, including the 3TU program (See
http://www.mastermath.nl/)
These activities will not come
completely under the cluster, because they include areas outside
stochastics (the LNMB also covers deterministic operations research,
while the MasterMath is concerned with mathematics in general) and
because they are partly of a regional character. However, the cluster
will take responsibility and put these activities in a wider
perspective:
- The cluster will suggest the
stochastics core curriculum and its lecturers to the MasterMath
‘regie-orgaan’.
- The cluster will coordinate
regional and local programs with the aims of increasing the
efficiency of stochastics education and ensuring that all key
subjects in stochastics are taught on a regular basis in The
Netherlands. For the near future the cluster will for instance
strive after an increase in the number of courses in the area of
stochastics and life sciences.
- The cluster will organize
minicourses and workshops, and other special activities.
- The cluster will advertise The
Netherlands as an excellent place for studies in stochastics.
Eurandom will act as facilitating node,
giving administrative support. National courses will be mainly given in
Amsterdam.
Eurandom has organized many minicourses and tutorials in the past years,
and will continue this tradition in the coming years. The lecturers
were, among others, the Eurandom chairs (leading researchers receiving
an appointment as visiting professor at Eurandom) and selected Stieltjes
professors (appointed by the research school Thomas Stieltjes Institute
of Mathematics).
Eurandom is hosting a series of
“Young European Probabilists (YEP)” workshops. These workshops focus on
a single topic, and are organized for and by young researchers. With the
exception of two keynote speakers, the participants are researchers at
the stage either close before or close after their PhD. The YEP
workshops have been highly successful, and have drawn many talented
young researchers to The Netherlands. Recently, Eurandom has also
organized workshops with a similar scope in statistics and stochastic
operations research. The cluster will enable Eurandom to continue and
extend this fruitful initiative.
Minicourses for PhD students are also organized by the aio-network
Stochastics in their yearly “Hilversum” spring meetings, which have the
purpose of strengthening the communication between the PhD students in
stochastics at the various Dutch universities (http://www.math.vu.nl/~stochgrp/aionetwerk/).
The activities of the aio-network will be brought under the
responsibility of the cluster.
Minicourses in finance, for PhD students, researchers and people in
industry, are given at the yearly Winter school in finance in Lunteren
(See
http://staff.science.uva.nl/~spreij/stieltjes/winterschool.html).
5. Science,
industry and society
∆
In Section 1 many examples were
mentioned of situations in which probabilistic modelling is important.
Thus it is not surprising that many researchers in the stochastics
cluster have ties with researchers in other sciences, industry,
government research institutes, or society at large. These and new ties
will be actively pursued within the cluster. The cluster as a whole will
function as an access point for expertise in the broad area of
stochastics, and also as a training center in stochastic modelling for a
new generation of researchers. In Section 1 we have listed the five
topics on which the cluster will focus its attention. As argued in that
same section, stochastics takes its motivation from a wide range of
scientific fields, and once stochastic models and methods have been
developed for one field they are often applicable to other fields as
well. Hence we shall have an open eye for interesting problems in fields
like astronomy, chemistry and materials science (with, a.o., fascinating
problems regarding stochastic geometry), social sciences and law, even
if these fields do not feature prominently in the list of research
projects in Section 6.
The cluster received support letters
from the following persons/organisations:
- Prof.dr.ir. G.J. van Oortmerssen -
Director of TNO-ICT
- M.A. van den Brink - Executive
Vice-President, ASML
- The Scientific Council of Eurandom
Several more letters will be
provided later on.
To illustrate of the wide range of
applications in which the cluster is involved, an (incomplete) list of
current projects that have already led to output in the form of joint
publications or implementation of methodology is given below.
- Ion channel kinetics. B. van
Duijn, Fytagoras Plant Science BV, Leiden
- Neuroscience, A.B. Brussaard, A.B.
Smit, Department of Biology, VU, J. Verhaagen, Department of
Neuroregeneration, Netherlands Institute for Neuroscience
- Carcinogenesis, E.G. Luebeck, Fred
Hutchinson Cancer Research Center, Seattle
- Genomics, Proteomics, B. Ylstra,
Faculty of Biology, VU, G. Meier, VU Medical Centre, E. Marchiori,
Dept. of Computer Science. A.B. Smit, Department of Biology, VU, C.
Jimenez, VU Medical Centre
- Statistical Genetics, Biological
Psychology, D.I. Boomsma, Faculty of Psychology, VU, P. van
Dommelen, TNO-Leiden, P. Heutink, VU Medical Centre
- Medical Imaging (PET, MEG), R.
Boellaard, A.A. Lammertsma, C. Stam, VU Medical Centre
- Detecting effects of attachment
therapy to disabled children, C.G.C. Janssen and C. Schuengel, Dept.
of Special Education, VU
- Epidemiology, Biostatistics, J.
Robins, Harvard School of Public Health
- Infectious animal diseases, H.
Heesterbeek, Department Animal Medicine, UU, G. Boender, D.
Klinkenberg, M. de Jong, IDDLO-Lelystad
- Forensic science, M. Sjerps,
Forensic Institute, Rijswijk
- Batch-quality of horticultural
products, O. van Kooten, L.L.M. Tijskens, Horticultural Production
Chains Group, Wageningen University
- Statistical process control,
R.J.M.M. Does, IBIS UvA BV
- Risk management, A. Lucas, Faculty
of Economy and Business Sciences, VU
- Wireless networks, Various
projects, a.o. with: M. Cook, J. Bruck, Dept Electrical Engineering,
M. de Graaf (Thales), Ph. Whiting, P. Gupta (Lucent Technologies,
Bell Labs, Murray Hill); with J. Wieland (Vodafone); with J.L. van
den Berg, R. Litjens et al. (TNOICT)
- Wired networks, Surfnet, TNO-ICT,
WorldCom, Lucent Technologies
- GRID networking, H. Bal, TH.
Kielmann, Dept. Computer Science, VU
- Performance evaluation of ad-hoc
networks, J.L. van den Berg, TNO-ICT
- Performance analysis of
computer-communication networks, Projects on (i) Cable access
networks, (ii) Networks-on-Chips, (iii) Mesh networks. T.J.J.
Denteneer, A.J.E.M. Janssen, V. Pronk et al., Philips Research
Laboratories
6.
Research projects
∆
6.1 General
methodology
The title of this subsection, uninformative as it may be, reflects the
fact that an important part of stochastics research is directed at the
development and understanding of models and methods that are
“universally” applicable. For instance, it is clear that the application
of methods from Bayesian statistics has increased dramatically in the
past decade, in almost any area where statistics is important, including
the other four research topics below. However, there is still a large
gap in the understanding of the accuracy of these methods. The outcome
of the following projects will be important in a broader context.
1. Bayesian semiparametrics
Bayesian methods in statistics go back to Bayes in the 18th century.
They express prior beliefs about a situation in terms of a probability
distribution, and next update this distribution using empirical evidence
concerning the situation under study. These methods were initially
propagated by subjectivist Bayesians, who stressed the subjective nature
of prior beliefs, but have increasingly been adopted by objective
statisticians, who use the Bayesian methods within a classical
statistical set-up. A wide variety of Bayesian procedures can now be
implemented using computer simulation (e.g. “MCMC”), and prior
distributions can be used to model complicated structures arising in a
variety of applications (e.g. classification, curve-fitting, covariate
modelling, network modelling). In the past decade the emphasis has been
on developing and studying algorithms used to implement Bayesian
procedures, using a variety of priors. Although it is known that in an
objective sense many (or even most) prior distributions give adverse
results, inferior to non-Bayesian methods, relatively little research
has been carried out to study the performance of Bayesian methods.
Particularly in the complex settings where prior modelling is thought to
be of help, deeper insight in the effect of choosing a particular prior
is necessary. We intend to study Bayesian procedures for priors on
complex structures, loosely speaking semiparametric models, or models of
high dimension. Priors for functions may be constructed using models for
stochastic processes, such as Gaussian or Lévy processes. For priors on
large discrete structures, such as graphical networks or pedigrees, it
is necessary to develop appropriate asymptotic methods to study the
quality of posteriors.
2. High-dimensional models
In the 1980s and 1990s a new branch of statistical modelling was
developed enabling application of statistical techniques in situations
where less a-priori knowledge of the relationships between various
variables is known. These techniques have been adopted in diverse areas
of application. One may think of the success or failure of a medical
treatment as a function of background variables of patients. Often many
covariates (e.g. age, sex, weight, blood pressure, medical history,
genetic factors) are thought to influence the outcome, but little is
known about the exact numerical relationships. One may also think of an
economic variable such as unemployment or the effect of counseling on
unemployment as a function of person characteristics or other economic
variables. One may think of nonresponse on an interview conducted by the
CBS (Central Bureau of Statistics of The Netherlands). As a fourth
example of application one may think of the effect of air pollution on
health as a function of environmental and population data.
2a. Very high-dimensional
models
To measure the causal effect of a treatment or condition (e.g. air
pollution) using observational data it is necessary to include all
variables that could influence the treatment in the analysis.
Semiparametric models as developed in the past allow this, and are much
more flexible than the parametric models that were the standard in the
past, but in many ways still make many assumptions concerning the
structure of the data (linearities, additivity, low dimensionality,
homogeneity, smoothness, etc.) We wish to investigate methods that use
still larger models, which can better fit reality. The conclusions that
can be drawn using such larger models will have a larger uncertainty
margin (technically: wider confidence intervals), which is unpleasant on
the one hand, but provide more honest indications of uncertainty given
the available data. Many controversial applications of statistics center
around a proper quantification of uncertainty. For instance, it is
simply not that easy to estimate the effect of air pollution on health.
2b. Shape-constrained methods
Shape constraints show up very naturally in different areas of
application, e.g. in earth sciences, medical imaging and survival
analysis. Often it is possible to fit useful models that only impose
these shape constraints. We aim at developing deeper understanding of
shape-constrained methods, conceptually, computationally and
asymptotically. We also wish to popularize shape-constrained methods in
diverse areas of application.
2c. Regularization
Many of the theoretical results for infinite-dimensional parameter
spaces take the form of showing that a certain estimator is pointwise
consistent or has a certain pointwise rate of convergence. For finite
sample sizes, however large, there will often be a large infinite subset
which provides models consistent with the data. Depending on the problem
under consideration some form of regularization will be necessary. If
for example the model is a semi-parametric translation model where the
density is to be estimated, it makes sense to minimize the Fisher
information within the class of models consistent with the data. Without
this any confidence interval based on the calculated density may be
wildly optimistic. None of the standard methods of estimation is
concerned with this form of global regularization subject to a data fit.
The aim of the project is to develop such methods for some standard
problems in non-and semiparametrics.
2d. Sparsity
An aspect of high dimensional models that can often be used to
construct successful predictors is sparsity. A model is called sparse if
the parameter space is high dimensional in principle, but the number of
important parameters is known to be small. Sparsity based methods
automatically select the important parameters from the large set of
potential parameters and give meaning to estimation problems that used
to be classified as unidentifiable, e.g. because the number of
parameters was larger than the number of observed data points. A highly
relevant application of sparse modelling is the analysis of microarray
data in statistical genetics.
3. Statistics of extremes
The last two decades considerable progress has been made in
Statistics of Extremes. This statistical theory is based on Extreme
Value Theory, the probability theory of extremes. A prominent example is
the estimation of quantiles that are so large that they are on the
boundary or outside the range of the dataset. Statistics of Extremes is
one of the subfields of mathematical statistics with the nice feature
that the obtained theoretical results are applied immediately in various
fields, whereas applications often trigger theoretical developments.
Most of the existing theory deals with independent, identically
distributed, one-dimensional random variables and in case the data are
multivariate, the results are typically only of use in dimension 2 or 3.
It is challenging and very useful to extend the present theory to more
complex settings. One extension is to deal with high-dimensional data or
even stochastic processes. Another extension considers time series
instead of independent data. A third extension replaces the identical
distributions by distributions where covariates play a role, the
regression setting. Although some results have been obtained in all
these directions, the majority of problems still have to be addressed.
4. Model choice
Most paradigms of model choice allow only comparisons between
different models. They are based on a single valued fidelity measure and
a single valued penalty term which measures the complexity of the model.
Within this paradigm it is not possible either to say a particular model
fits or does not fit the data without reference to other models. One
essential ingredient of scientific practice and the main motor of
scientific advance, that of not being satisfied with any of the models
on offer, is therefore missing. The bonus of deciding whether the model
fits the data is often reduced to that of diagnostics although it is
clear that is a much too simplistic and dismissive attitude to the
problem. The aim of the project will be to develop direct measures of
fit which are multivalued and give grounds for accepting or rejecting a
model on its own merits. This has to be complemented by measures of
simplicity which again may be multivalued with the intention of
calculating one or more simplest models which provide a satisfactory fit
to the data in so far as one such exists.
5. Multi-dimensional structured Markov processes
Multi-dimensional structured Markov processes (MSMP) form the most
natural model for complex real-world phenomena that exhibit stochastic
behaviour. They are being widely applied in fields as diverse as
physical sciences, biology, engineering, computer-communications and
logistics. The adjective “structured” has been added to emphasize that,
in nearly every application, Markov processes possess structural
properties, and these properties should be exploited in the analysis.
Multi-dimensional Markov processes are much less understood than their
one-dimensional counterpart. New methods have to be developed for
obtaining explicit and computable results on the behaviour of MSMP:
matrix-analytic methods, methods to obtain bounds, and asymptotic
methods for studying rare events in MSMP.
6. Transient behavior of high-dimensional Markov chains
Transient behaviour of stochastic processes represents systems
before their relaxation to equilibrium, or systems where equilibrium is
either trivial or non-existent. Typical applications where transient
behaviour plays a dominant role are the following. In the PageRank
problem pages are ordered according to their current relevance in the
WWW-graph that is modelled as a Markov chain with a huge state space. In
(wireless) communication systems dimensioning (capacity allocation) must
often be carried out on short time-scales taking into account e.g.
teletraffic bursts due to rush hours. Other applications include large
population dynamics in e.g. biology, neuronal networks, and large flow
models such as for atmospheric flows. In contrast with equilibrium
behaviour, little is known about transient behaviour. Especially
short-term behaviour is dominated by transient effects. Short term
behaviour may be highly sensitive to small perturbations in intensities
of transitions, such as due to steering of parameters or uncertainty in
system parameters. Besides the obvious theoretical challenge of
understanding the transient behaviour of stochastic processes, current
and future applications of stochastic processes require more and more a
detailed understanding of their short-term behaviour. In this respect,
it is challenging to integrate techniques from numerical analysis,
(partial) differential or difference equations for very large systems
into the framework of stochastic processes, explicitly taking into
account the structure of the algebra of the stochastic process.
• The key concept that binds all projects is “high dimensionality”.
This is another way of referring to the complexity of present-day
science, industry and society, mentioned in the introduction of the
proposal. There is much information available that can be used to
control and understand systems of importance, from biological networks
to traffic on the internet or on the road, and new techniques to gather
even more information are invented regularly. However, it is often not
easy to use all this information in a useful and accurate manner.
Modelling in terms of probabilities is a fruitful approach in many
situations, but the models must reflect reality in order to lead to
understanding or predictions. This requires studying the properties of
the models, and developing techniques to tune them to the data. Because
the situations are often of a novel nature, “classical” techniques are
often not sufficient, even though classical fundamental concepts are
always a good starting point.
6.2 Mathematical
statistical physics
∆
In physics, spatially extended systems consist
of a large number of components that, though interacting only locally,
exhibit a long-range global dependence, resulting in anomalous
fluctuation phenomena and phase transitions. At the microscopic scale, a
random dynamics acts on the components of the system, resulting in an
evolution that is typically highly complex. The key challenge is to give
a precise mathematical treatment of the interesting physics that arises
from this complexity, at the macroscopic scale. Both equilibrium and
non-equilibrium behavior are relevant. There is a strong link with
stochastic networks, through the study of the statics and the dynamics
of random networks of interactions. In addition, many notions and ideas
developed in mathematical statistical physics are slowly making their
way into mathematical biology, e.g. hierarchical structures,
coalescence, universality.
1. Self-organized criticality
Certain classes of physical systems have the property that they
evolve naturally – without fine-tuning of parameters such as temperature
– into a stationary state that behaves in a critical way, e.g.
characterized by power-law decay of correlations. This natural evolution
into a critical state is called self-organized criticality (SOC), and
has been observed and studied in a wide variety of models. Examples are
the Bak-Sneppen model (for evolution of fitness), the Abelian sandpile
model (for the motion of grains in a sandpile), forest fires and
earthquakes. Models with SOC typically exhibit some form of
“avalanches”, i.e., non-local rare events in which a large part of the
system is updated. These avalanches are fundamental in order to create
the self-organized critical state. The mathematical study of models with
such non-local behavior is a challenging new area of interacting
particle systems. The two-dimensional Abelian sandpile model represents
a major challenge, as it is related to logarithmic conformal field
theory and various interesting combinatorial objects, such as spanning
trees and oriented circuits.
2. Metastability
Metastability is a phenomenon where a physical, chemical or
biological system, under the influence of a noisy dynamics, moves
between different regions of its state space on different time scales.
On short time scales the system is in a quasi-equilibrium within a
single region, while on long time scales it undergoes rapid transitions
between quasi-equilibria in different regions. Examples of metastability
are found in biology (folding of proteins), climatology (effects of
global warming), economics (crashes of financial markets), materials
science (anomalous relaxation in disordered media) and physics (freezing
of supercooled liquids). The task of mathematics is to formulate
microscopic models of the relevant underlying dynamics, to prove the
occurrence of metastable behavior in these models on macroscopic
space-time scales, and to identify the key mechanisms behind the
experimentally observed universality in the metastable behavior of whole
classes of systems.
3. Random polymers
A polymer is a long chain consisting of monomers that are tied
together via chemical bonds. The monomers can be either single atoms
(such as carbon) or molecules with an internal structure (such as the
adenine-thymine and cytosine-guanine pairs in the DNA double helix).
Examples of polymers are: proteins, sugars, fats, plastic and rubber.
The chemical bonds are flexible, so that the polymer can arrange itself
in various spatial configurations. The longer the chain, the more
complex these configurations tend to be. For instance, the polymer can
wind around itself to form a knot, attract or repel itself due to the
presence of charges it carries, interact with a surface on which it may
be adsorbed, or live in a wedge between two confining surfaces. The key
challenge is to unravel the complexity in behavior due to the long-range
interactions characteristic for polymer chains. Particularly challenging
are phase transitions as a function of underlying model parameters,
signalling drastic changes in behavior when these parameters cross
critical values.
4. Spin glasses
Spin glasses are prime examples of complex systems. Spin glass
models were introduced in condensed matter physics to describe amorphous
systems (diluted magnetic alloys, structural glasses). Contrary to
homogeneous systems, like crystals or ferromagnets, spin glasses have a
highly non-trivial broken symmetry, with a hierarchical organization of
equilibrium states. This is also reflected in an anomalous dynamical
behavior: slow relaxation, aging, memory effects. The mathematical
description of spin glasses made considerable progress in recent years.
The rich mathematical structure that arises from the solution of the
mean-field Sherrington-Kirkpatrick model is nowadays the subject of
intensive investigation. Concepts originally introduced in the study of
spin glasses also found applications in areas as diverse as
combinatorial optimization, neural networks, protein folding and
economics.
5. Non-equilibrium steady states
A paradigm of non-equilibrium is a system in contact with two heat
baths at different temperatures or with particle reservoirs at different
densities. The system will eventually evolve into a stationary state
that is non-equilibrium (i.e., non-reversible, current-carrying) and
that typically shows behavior very different from an equilibrium system.
Of fundamental interest is the correlation structure and the large
deviation behavior of such a non-equilibrium steady state. Typically,
non-local large deviation free energies appear and the system exhibits
persisting long-range correlations.
6. Random surfaces
Properties of random surfaces are of wide scientific interest. Not
only can they be used as models for interfaces between different phases,
describing the separation between gas and liquid or between liquid and
solid, they also play an important role in interacting particle
theories, and they occur in biological systems in the form of (cell or
intracell) membranes. Whether subject to random fluctuations, disordered
environments or external forces (like osmotic pressure), their shapes
and widths, being governed by the size of the fluctuations, are of a
wide interest. Random surfaces can in some sense be viewed as
two-dimensional analogues of (onedimensional) polymers, but their
properties tend to be harder to study. Phase transitions between
different types of behavior are expected to occur, but many of their
properties are as yet ill understood.
7. Critical phenomena
in two dimensions
Physicists have extensively studied second order phase transitions
(also known as critical phenomena) using the renormalization group and,
in two dimensions, conformal field theory. These traditional approaches
leave many questions unanswered, especially those that concern the
geometric aspects of critical phenomena. Many physical systems
undergoing a second order phase transition (such as magnetic materials)
present fluctuating boundaries, which assume random shapes that can be
described by conformally invariant random fractal curves. For this
reason, one is led to study stochastic geometric models. In particular,
the recent discovery of the Stochastic Loewner Evolution (SLE) has
produced spectacular developments, has linked conformal field theory to
probability theory and complex analysis, and has opened up new and
unexpected perspectives. Key challenges are: using SLE to obtain a
better understanding of critical phenomena, making further connections
between SLE and some of the most important mathematical models of
statistical mechanics, and explaining why many different models present
the same behavior at or near the phase transition point.
8.
Quantum statistics
Quantum statistics aims to do statistical inference (and design of
experiments), to learn about the state of quantum systems and the
working of quantum operations, with such a small amount of data that the
quantum randomness in measurement outcomes is of a size similar to the
signal one wants to extract. Since just a few years, physicists are able
to work in the laboratory with single or very small numbers of quantum
systems. The field of quantum information has grown explosively, partly
with a view to future nanotechnology at, say, the atomic level. From a
theoretical side, the contours of a LeCam like theory of convergence of
quantum statistical experiments, is only just beginning to emerge, and
should be used to clarify the presently fragmented and confusing
asymptotic results on optimal state estimation. Topics such as quantum
tomography should benefit from present day statistical insight into
nonparametric curve estimation and nonparametric inverse problems. The
design of Bell-type experiments, to confirm some of quantum physics’
most startling predictions, are linked to nonparametric missing data
problems from classical statistics.
• The eight research projects listed above are linked through their
common search for methods to describe complex macroscopic phenomena
based on simple microscopic dynamics. Common tools are large deviation
theory, variational calculus, Gibbs theory and operator theory.
Coalescence is a key notion in metastability and spin glasses,
universality is a driving force of self-organized criticality, random
surfaces and critical phenomena in two dimensions, while ergodic theory
underpins much of random polymers and non-equilibrium states.
6.3
Stochastics and the life sciences
∆
Randomness and complexity abound in the life sciences. In genetics, the
use of probabilistic models and statistical techniques has a long
history, but the recent revolution in genomics and proteomics has
changed the outlook dramatically. In the neurosciences and molecular
cell biology, mathematical models play an increasingly important role,
often in the form of networks of various nature (genetic, metabolic,
neural). Stochastics contributes both via the analysis of new
probabilistic models (e.g. time series, random graphs, Markov processes
of many types, particle systems) and via supplying new statistical
techniques that can cope with these new models and the many new data
platforms, often of high-throughput type.
1. Population genetics
The evolution of DNA-sequences is part of population dynamics, with
individuals representing alleles. Individuals are organized in
“colonies”, migrate between colonies, and evolve within a colony due to
resampling, mutation, selection and recombination. It turns out that on
large space-time scales the population behaves in a way that is to a
large extent independent of the precise evolution mechanism within a
single colony. This is because the migration provides an effective way
of “averaging out” over the population. For populations consisting of
only one type of individual this phenomenon is well understood. For
populations consisting of two or more competing types, very little is
known, and a much greater richness of universality classes is expected.
2. Immunology
The human immune system contains 107 different types of T-cells.
These T-cells interact with antigens (“antibody generating” cells) and
trigger an immune response. However, the number of different types of
intruder cells is at least 106 times larger than the number of available
types of T-cells. Thus the question arises: “How does the system manage
to recognize so many intruder cells in an effective manner, against a
background of our ‘own’ cells that are non-intruders?” Stochastic models
are capable of explaining this efficiency, both qualitatively and
quantitatively: recognition is more robust when it is done in a
stochastic rather than in a specific manner.
3. Sequence
alignment
Given two DNA-sequences, how likely is it that the differences in
base pairs occurring along these sequences are due to chance? In other
words, what is the probability that the sequences are not related to
each other via some given evolutionary mechanism? The so-called BLAST
algorithm is a way to weigh the differences according to an appropriate
penalty scheme and, based on the outcome, to accept or reject the
hypothesis that the sequences are not related. This algorithm is used
extensively throughout the genetics community, but so far has been given
very little mathematical foundation. With the help of large deviation
theory, it is possible to estimate the probability of seeing a large
number of differences in stochastically unrelated sequences. This
probability exhibits “Gumbel law” type behavior, known from extreme
value theory. It is a challenge to compute the relevant parameters in
this law, which are needed to provide the appropriate confidence
intervals in statistical estimation.
4. Statistical
genetics
Statistical genetics is a classic branch of applied statistics, studying
the relationship of phenotypes, such as disease status or quantitative
traits, to genotypes. The field has attained new emphasis with the
advances of cell biology and experimental technology, which allows to
measure unprecedented amounts of genotypic and phenotypic data: e.g.
SNP-arrays, expression-and CHG-arrays, proteomics profiles. Classical
statistical techniques, such as pedigree analysis, nonparametric linkage
or association analysis, or variance decompositions, have not lost their
value, but must be transformed to take into account the new questions
and data. High-dimensionality is a common theme, and challenge, for all
such extensions. Not only are the data themselves massive. Potential
interactions (e.g. between genes or SNPs) or networks (genetic,
proteomic, metabolic) rapidly multiply the number of possibilities one
needs to investigate. Many traits are thought to depend on many genes,
which may have small and nonadditive effects. Many medicines are thought
to have nonlinear effects that depend on many factors. The techniques in
modern statistical genetics come in a great variety. Some build a layer
on existing techniques, e.g. multiple testing. Others are based on
probabilistic models for the phenomenon under study, e.g. the coalescent
for DNA evolution. A common theme is that high-dimensional data also
provide high-dimensional noise, and statistical techniques are necessary
to extract the signal. A general challenge is to link up with
statistical and machine learning methods for model selection and
adaptation that have been developed in theoretical statistics and other
areas of application.
5. Survival analysis
How can genetic and historical information be combined to better
predict the residual life of a person or an illness? The analysis of
(censored) survival data has received much attention during the past
decades. Asymptotic theory and algorithms exist for estimators in
complicated models, but there are still big challenges ahead. One of
those is to connect shape constrained survival models to high
dimensional covariates. Such models can be used for prediction purposes,
combining historical (often censored) information with usually high
dimensional (genetic) data for the patient at hand. There is a strong
link with Section 6.1.
6. Biological networks
Building mathematical models for biological networks and developing
statistical techniques for the analysis of corresponding data is at the
core of many areas: the investigation of gene regulatory networks in
genetics and molecular biology, of neuronal networks in neuroscience,
and of metabolic networks in molecular and cellular biology, all
generate complex modelling problems and new statistical issues. Because
the characteristics of these networks and the biological questions about
them differ, the required modelling and analysis tools are different for
each field. Also here high-dimensionality is a common theme, as is the
complex interaction structure between the different network components
of the same and of different types. Typically, interaction between genes
or gene products, between spiking neurons, or between local field
potentials or EEG and MEG signals of different brain areas are
investigated pairwise or by means of classical techniques like cluster
analysis. Although this still yields important information, there is a
need for new multivariate techniques, in the time domain as well as in
the frequency domain, with which more complex network connectivity
patterns can be inferred. For more detailed investigations model based
techniques are needed. Development of probabilistic models together with
appropriate parameter estimation methods for different types of networks
is therefore an important issue. Candidate models range from dynamic
Bayesian networks or counting processes to hidden Markov models.
Bayesian estimation techniques together with simulation based methods
such as MCMC are expected to play a major role, but they need to be
tuned to the high dimensional context of these applications.
Traditionally, (elements of) metabolic networks have been modeled
primarily by deterministic models consisting of sets of differential
equations. New technical developments have resulted in larger and
complex data sets in this area and in the possibility to look at one and
the same biological question from different angles and on different
scales. This means that several sources of randomness or noise – due to
small space or time scales, or stemming from the observation of
individual objects instead of groups/populations of objects – need to be
taken into account for accurate modelling and parameter estimation. Next
to building stochastic models that can be incorporated in or connected
to the deterministic ones, another challenge is to adapt existing or
develop new statistical techniques that can deal with the analysis of
data based on such combined and complex models.
• Most of the projects listed above require input from a wide range
of topics in stochastics, often used in a joint manner. For instance,
statistical analysis of biological systems requires probability models,
but realistic probability models can only be built by comparing their
outputs statistically to data. The projects will be carried out in
continuous interaction with biologists, psychologists, medical
scientists, and other scientists from the life sciences. The results
will be mathematical and methodological progress, but should also have a
direct impact on the subject matter sciences. Several research topics
also include a notable interaction with computer science.
6.4 Stochastic
networks
∆
Congestion phenomena occur when resources (machines at a factory,
elevators, telephone lines, traffic lights) cannot immediately render
the amount or the kind of service required by their users. Similar
queueing phenomena also arise, at the byte level, in modern
data-handling technologies (communication systems, computer networks);
they are typically less visible but their effects at user level are
usually not less serious. Such congestion phenomena are often very
effectively studied by mathematical methods from queueing theory.
Congestion control in stochastic networks is an extremely active
area of research. One of the key reasons for its strong viability is
that, time and time again, interesting new questions from application
areas like computer-communications and manufacturing give rise to new
and challenging queueing problems. Much research is being triggered by
the need to understand and control these highly complex systems, and
thus to improve their design and performance.
Presently, novel
communication networks (wireless, peer-to-peer, ad-hoc) are giving rise
to random graph models and systems that exhibit some form of
self-organization. The performance analysis of such networks requires
the use of techniques from queueing theory and statistical physics. The
cluster will stimulate interaction of researchers from both fields.
1. Simultaneous resource possession
In classical queueing networks customers (jobs, particles, products,
transaction requests) move through the network requiring service from
one service entity at a time and only request resources from another
node after having released the previous one. However, in many real-life
situations it is much more natural to allow customers to consume
multiple resources simultaneously. Typical application areas are
production (multi-type, multi-skill product composition) systems, and
wireless communication networks (mobile ad-hoc networks, mesh networks)
and computer-communication systems with intensive software/hardware
interaction (application servers, middleware).
The phenomenon of simultaneous resource possession (SRP) causes
strong dependencies between the operations of service nodes and the
residence times of individual customers in the system, which opens up a
wealth of challenging questions regarding the performance and control of
such systems. Motivated by this, models with SRP have attracted
considerable attention. For example, stability for such systems has been
shown to be non-trivial, even in seemingly simple small-scale toy
examples. Particular emphasis has been put on the use of fluid limit and
diffusion scaling techniques. Another main line of research has focused
on “computable” resource sharing strategies.
Today, an in-depth
understanding of the fundamental dynamics of these systems is lacking.
In this context, our aim is to further study systems with SRP,
particularly under a variety of asymptotic analysis techniques. Besides
under fluid and diffusion scalings, the system complexity may be reduced
under scaling regimes such as heavy traffic and nearly complete
decomposability which are often natural for the applications mentioned
above. Singular perturbation methods for infinite-dimensional Markov
processes may be used to study these complex systems in a suitable
asymptotic regime. In particular, by equipping the “first” order limit
with higher order refinements, the applicability of the chosen regime
can be studied and adapted when necessary.
2. Online
control of queueing systems
In many systems some form of control exists: jobs may have to be
ordered before processing, jobs can be routed to different servers, and
admission decisions have to be taken. In general, the class of all
control policies is huge, but sometimes it can be shown that the optimal
policy lies within a class of policies that is characterized by only a
few parameters. And even if this is not the case, then restricting to a
properly chosen subset of policies is not far from optimal. Typical
questions that arise are: How to determine the proper set of control
parameters? What is the (near-)optimal decision given these parameters?
A complicating factor is that these control parameters usually depend on
other parameters such as the projected number of new customers that is
about to arrive. The standard approach is using statistical techniques
first to estimate parameter values and then look for the optimal policy
within the chosen set. The main disadvantage, however, of this approach
is that in practice these parameters fluctuate and are hard to estimate.
This raises the need for the development of effective yet simple online
congestion-control techniques in which parameter estimation is an
integral part of the policy selection process.
3. Random graphs and complex networks
Empirical studies on real networks, such as the Internet, the
World-Wide Web, social and sexual networks, and networks describing
protein interactions, show fascinating similarities. Most of these
networks are “small-worlds” (meaning that typical distances in the
network are small) and have “power-law degree sequences” (meaning that
the number of vertices with degree k falls off as an inverse power of
k). Incited by these empirical findings, random graph models have been
proposed to model and explain these phenomena. While the proposed models
are quite different in nature, they behave rather universally, in the
sense that many graph properties, such as the typical graph distance,
the amount of clustering in the graph, its diameter and its connectivity
properties, depend in a similar way on the degree sequence of the graph.
Topological properties of networks are crucial for many processes living
on these networks, such as the spread of a disease in a social network,
viruses in Internet, HIV in a sexual network, or search engines on the
WWW.
4. Excitable media
Consider a network with a large number of nodes, each of which can
be in two states: ‘on’ (alert) or ‘off’ (recovering). A node in state
‘on’ can be triggered by signals from outside. When this happens, the
signal spreads instantly over the entire connected component of
‘on’-nodes, after which all these nodes are turned ‘off’ and need a
(random) recovery time before they turn ‘on’ again. In these systems
typically the number of signals from outside, per node per unit time, is
small. Examples are forest-fires (where the nodes are locations of
trees, signals correspond to ignitions, and ‘on’ and ‘off’ stand for a
tree being present or absent), networks of neurons, and rapidly
spreading infections. These systems exhibit a form of self-organization,
although how is still poorly understood.
• The four research projects listed above are deeply intertwined. For
example, online control problems in queueing systems in which the
simultaneous possession of resources plays a predominant role occur
naturally in the modelling of many computer-communication systems (e.g.
in the derivation of effective thread-spawning algorithms in file
servers that may boost the performance of such servers). Another
example, currently a “hot topic”, is the online control of stochastic
networks in which the topology is largely subject to randomness (e.g.
mobile ad-hoc networks and sensor networks). A final example concerns
the problems related to excitable media spreading signals over connected
components, which is closely related to the spreading of viruses over
the Internet and has links with invasion percolation in statistical
physics.
6.5 Stochastic
finance and econometrics
∆
In the field of economics, stochastics can
contribute to finance and econometrics in particular. Econometrics has a
strong ongoing link to statistics. Main concerns in mathematical finance
are risk management, derivative pricing and portfolio optimisation. This
field has grown exponentially in the past decade, and is dominated by
stochastic modelling. The whole spectrum of stochastics is relevant,
with examples ranging from questions in applied probability to notions
usually encountered in statistical quantum physics. There is ample
opportunity for a larger contribution of the Dutch stochastic community,
in particular in the core field of derivatives.
1. Derivatives
Derivative securities such as options are
financial instruments that allow taking out or reducing the risks of
financial transactions. The markets for these instruments have grown
worldwide in size and importance, and mathematical models are
instrumental for determining the prices on these markets.
For better picturing market realities one
seeks models more general than Brownian motion, for instance in terms of
Lévy processes. Such models entail new questions of both a conceptual
and practical nature: pricing in incomplete markets, calibration of
martingale measures, or the fine structure of asset prices given
high-frequency data, to name but a few. High-dimensional concepts are
instrumental here once more. The stochastic volatility models developed
in extension of the classical but simplifying Black-Scholes model depend
on stochastic surfaces or higher-dimensional stochastic manifolds of
finite dimension; models for fixed income or credit derivatives work
with infinite dimensional stochastic (Banach) manifolds. Methods to
compute derivatives prices in these models are generally lacking, let
alone explicit pricing formulas.
Building on existing competence and
expertise we shall pursue a two-pronged approach, which combines
developing constructive methods with conceptual understanding. The main
lines of research to be initially pursued include the following ones.
1a. Probabilistic structure of
fundamental constructions in derivatives
Averaging over time of stochastic
processes is typically such a construction. In the Brownian situation
the difficulties of this construction have been resolved only recently,
by establishing non-obvious but characteristic connections with other
branches of mathematics, in particular with harmonic analysis on
Poincaré's upper half plane. We will now in particular study the
structure of time-averages of Lévy processes.
1b. Stochastics for new derivatives
Widely traded classes of options, in
particular path-dependent options, suffer from intrinsic problems. We
shall develop the stochastics concepts, and the consequent new financial
instruments, which allow to cope with these problems. Barrier options,
as a typical example, are prone to problems due to the discontinuities
that arise from their underlyings hitting their barriers. Various ways
to remedy these problems have been proposed in the Brownian case. We
will complete the study of the explicit structure of the occupation-time
and excursion-theoretic concepts these proposals are based on in the
Brownian case, and then proceed to seek the relevant concepts and the
extensions of these results in particular for Lévy processes.
1c. Constructive stochastics methods
for high-dimensional valuation problems
Initially we will look at problems
depending on stochastic surfaces, as they arise with stochastic
volatility models. We shall seek extensions of the methods that have
been developed in the Brownian case, which are based on orthogonal
series, and which have proved there to provide efficient ways for
computation. Extensions to higher-dimensional situations will form the
second step of the program.
1d. Discrete and lattice methods in
derivatives
Recent classes of fixed income derivatives
combine discrete construction features in continuous-time and
continuous-space models. To address this type of valuation problems we
shall seek to adopt operator calculus methods and methods from lattice
models in statistical quantum physics. We shall emphasize the ability of
estimating pricing kernels as well as that of achieving high-precision
computability.
1e. Stochastic structure of infinite
dimensional derivative valuation problems
The evolution of financial variables such
as interest rates, credit ratings, and many other variables is typically
governed by SDEs or PSDEs and thus takes place in stochastic manifolds
of an infinite dimension. On the one hand we will investigate the
structure of these manifolds if additional constraints such as
positivity are to be satisfied. This will be approached by further
developing the connections with stochastic geometry, Malliavin calculus,
as well as the theory of pseudodifferential and Fourier integral
operators. On the other hand, we will seek efficient methods for
approximating the corresponding infinite-dimensional valuation problems
by finite-dimensional ones (using Banach bases or generalized orthogonal
function spaces), and thus develop constructive approaches to the
explicit valuation of derivatives in such situations.
2. Time series and risk modelling
Financial risk management requires the
historical analysis of financial time series, in order to build accurate
models that can predict future exposure to risk. Due to the Basel
agreements on capital management, which regulate the internal risk
management of financial institutions, the demand for accurate models,
and their understanding and statistical estimation, is bigger than ever.
Two main challenges are the modelling of the dependencies between the
changes in the value of financial assets and the modelling of their
extremal behaviour. Typical portfolios consist of hundreds of assets,
whose joint behaviour is neither independent, nor can be captured by
such simple measures as correlations. As risk has to do with extremal
values of assets (e.g. “value-at-risk”), it is particularly important to
understand the dependencies when one or more time series takes large
values. Hidden Markov models and other stochastic process models are
increasingly used for the purpose of modelling dependencies between the
prices of assets. We contribute to this field through the study of
extremes and development of statistical methods for stochastic
processes.
3. Credit risk and insurance
Credit risk is concerned with the risk of
default by a party to a credit contract. The risks in the credit market
are huge and typically involve events (defaults) that occur with very
low probability, but concern huge amounts of money. Credit derivatives
are used as insurance and can be very complex. There are essentially two
well-accepted approaches for the modelling of credit risk. The first
approach is a structural one, linking the occurrences of default
directly with the firm’s value behaviour. Default happens if the firm
value falls below a certain low threshold. The approach uses techniques
and stochastic processes that are also used in equity modelling. The
other approach is an intensity based approach where default happens
exogenously. The default time can be modelled as the jump time of a
counting process, and there is a close connection to queueing. Stylized
features of financial data in a credit risk setting are non-normal
returns, heavy tailedness and certain jump dynamics. Defaults and credit
risk are driven by shocks in the economy or individual firm. Modelling
default risk without jump dynamics is not realistic and clearly severely
underestimates the risks present. We shall study the impact of our
results about averaging constructions for reduced models. We further
pursue applications of derivative methods and results in insurance
constructions such as embedded options in insurance contracts.
4. Change-point problems
Change-point problems have been studied
extensively for the case of retrospective analysis of independent
univariate data. This case is too restrictive for practical use. E.g.,
in process industries one usually has several correlated process
characteristics, while feedback controllers result in observations that
are highly correlated and for which persistent internal process changes
can only be perceived through specific short-term patterns in external
observations. To tackle this kind of problems, one needs to study the
sequential procedures like GLR statistics for multivariate data for
specific epidemic alternatives. The distribution of such procedures is
very complicated and must be sufficiently understood to allow
implementation. Moreover, post-mortem analysis of detected change-point
problems should reveal the nature of the change-point, which is a
delicate multivariate statistical problem.
• The four research projects are but a
selection from a wide range of possible subjects. Other topics will be
added after enlarging the research staff in mathematical finance, and in
consultation with partners at the financial institutions and economics
departments. The topics include all three areas of stochastics
(statistics, probability and operations research) and involve some of
the most involved mathematical theory available (e.g. stochastic
integration, semimartingale theory, stochastic processes of many types).
Because derivatives are used for risk management by the financial
industry, there is a strong interaction between the first three
projects.
back to the top
∆ |