**Workshop on
****
**

information theory & mathematical biology"

**November 3-7, 2008**

**EURANDOM, Eindhoven, The Netherlands**

**ABSTRACTS**

**David Coupier**

**Local configurations in the Ising model**

A d-dimensional ferromagnetic Ising model on a lattice torus is considered in which we study the occurrences of local configurations. A local configuration is merely a deterministic pattern made up positive and negative spins. As the size n of the lattice tends to infinity, the magnetic field a=a(n) and the pair potential b=b(n) depend on n and can diverge. In this context, precise bounds for the probability for local configurations to occur in a large ball are given. Moreover, a Poisson approximation for the distribution of the number of occurrences in the lattice of any given local configuration are suggested. Finally, the distance between copies of different local configurations is estimated.

**Tomasz Downarowicz**

**The "law of series" in typical processes**

By a "typical process" we will understand a process generated on any
measure-preserving transformation (with positive or zero entropy) by a partition
typical with respect to the (complete) Rokhlin metric on the space of partitions.

We prove that strong attracting is a typical behavior for events constituted by
long cylinder sets (true for any entropy) as well as for "composite events"
constituted by balls in the Hamming distance between words (true for positive
entropy).

___________________________________________________________

**Jorge Freitas**

**The link between Hitting Time Statistics and Extreme Value Theory**

We consider discrete time dynamical systems and show the link between Hitting Time Statistics and Extreme Value Theory (distribution properties of the partial maximum of stochastic processes). This relation allows to study Hitting Time Statistics with tools from Extreme Value Theory, and vice versa. We apply these results to non-uniformly hyperbolic systems and prove that a multimodal map with an absolutely continuous invariant measure must satisfy the classical extreme value laws (with no extra condition on the speed of mixing, for example). We also give applications of our theory to higher dimensional examples, for which we also obtain classical extreme value laws and exponential hitting time statistics (for balls). We extend these ideas to the subsequent returns to asymptotically small sets, linking the Poisson statistics of both processes.

___________________________________________________________

**Stefano Galatolo**

**Shrinking targets, decay of correlations and arithmetical
properties**

We will consider the time which is needed for a typical point to enter is a
sequence of decreasing balls.

In many systems the time increases as the inverse of the measure of the balls.

More precisely let $tau_r(x,x_0)$ be the time needed for a point $x$ to enter
for the first time in a ball $B_r(x_0)$ centered in $x_0$, with small radius
$r$. We consider $\liminf_{r\to 0} \frac{\log \tau _r(x,x_0)}{-\log r}$.

In systems with generic aritmetical properties, or fast decay of correlations
the above limit is related to the local dimension $d_{\mu}(x_0)$ of the
invariant measure at $x_0$, but there are systems for which this relation does
not hold at all.

We will survey positive and negative results on this question, relating it to
some other topic in the literature, as dynamical Borel Cantelli results and
Logarithm laws.

In particular we will show a class of mixing examples, constructed by
reparametrizing suitable flows for which the above limit goes to infinity (while
dimension is 3) and some recent results about the Lorenz system.

**Marc Kesseböhmer**

**Distorted critical return time processes and continued fraction **

We consider conservative ergodic measure preserving transformations on infinite measure spaces and recall some stochastic properties of certain return time processes. We then show that for the critical cases a certain distortion allows us to derive uniform laws. Our leading example will be the iteration of the Farey map, which is closely connected to continued fractions.

**Ioannis Kontoyiannis**

**Pattern matching for data compression and sequence analysis**

We review a development of parts of data compression theory and pattern-matching algorithms, centered around two main ingredients: A generalized version of the Shannon-McMillan-Breiman theorem, which in information theory is also referred to as the Asymptotic Equipartition Property (AEP); and the development of asymptotic properties for recurrence times and waiting times based on the AEP and its generalizations.

We begin by reviewing how the classical AEP underlies the analysis of the celebrated Lempel-Ziv data compression algorithm by viewing it as a random code.

In the case of approximate matching, we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. The lossy AEP is applied to: (i) prove strengthened versions of Shannon's source coding theorem and universal coding theorems; (ii) characterize the performance of mismatched codebooks in data compression; (iii) analyze the performance of pattern-matching algorithms for data compression; (iv) determine the first order asymptotic of recurrence and waiting times between stationary processes.

A refinement to the generalized AEP is then presented, and it is used to: (i) prove second-order source coding theorems, including universal coding theorems; (ii) characterize which processes are quantitatively easier to compress; (iii) determine the second-order asymptotics of recurrence and waiting times between stationary processes; (iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes.

Finally, we discuss extensions of the above framework to random fields.

**Yves Lacroix**

**On the law of series in ergodic theory**

In a discrete time ergodic aperiodic dynamical systems, one can look at the recurrence times for small measure sets. As the measure of these sets decrease to zero, within several regular enough setups, one has proved that the normalized hitting time distributions or return time distributions tend to the exponentiel law. Such limiting distribution is called an asymptotic distribution. We have proved with T. Downarowicz that in a positive entropy system any asymptotic along cylinder sets (whence for symbolic dynamical systems) must be subexponential. This turns out to be interpreted as a rule saying that clusters (grouped repetitions of visits to a set) are whatever at least as large as they can be in the independent process case, where their appearances are ruled by the exponential distribution. We also present some material saying that typicaly a symbolic dynamical system should present along a sequence of lengths of cylinders of upper density one, absolute clustering.

**Jerome Rousseau**

**Poincaré recurrence for observations**

A high dimensional dynamical system is often studied by experimentalists through the measurement of a relatively low number of different quantities, called an observation. Following this idea, for a measure preserving system, we study Poincaré recurrence for the observation. The link between the return time for the observation and the dimension of the image of the invariant measure is considered. We prove that when the decay of correlations is super polynomial, the recurrence rates for the observations and the pointwise dimensions relatively to the push-forward are equal.

**Benoît Saussol**

**An introduction to quantitative Poincaré; recurrence
in Dynamical Systems **

The purpose of this minicourse is to present some recurrence results in the context of ergodic theory and dynamical systems. The main focus will be on smooth dynamical systems, in particular those with some chaotic/hyperbolic behavior. The aim is to compute recurrence rates, limiting distributions of return times, and short returns. Some basic tools from thermodynamic formalism and dimension theory will be recalled.

**Sophie Schbath**

**The statistical world of motif occurences along DNA sequences**

Statistics of motifs have been widely revisited in the last 15 years due to the increasing availability of genomic sequences. The identification of DNA motifs with biological functions is still a huge challenge of genome analysis. Many functional and essential motifs have the particularity to be very frequent all along the chromosome or to be concentrated in some particular regions (e.g. in front of genes) or to be co-oriented with the replication direction. The prediction of functional motifs is then mostly based on statistical properties of pattern occurrences in Markovian sequences. This lecture will be mostly devoted to such properties with a special focus on pattern frequency. How to compute or approximate the count distribution to assess motif exceptionality? How to test if a motif is significantly unbalanced between two (sets of) sequences? How to deal with more complex motifs? What is the distribution of the waiting time between occurrences? How to model motif occurrences to find regions significantly enriched with a given pattern? etc. Examples of functional motifs will illustrate all these questions and we will see how the Chi motif has been identified in Staphylococcus aureus thanks to its statistical properties.

**Sandro Vaienti**

**The Rényi entropy function and the large deviation of short return
times**

We consider the Rényi entropy function for weakly -mixing systems.

The first main result proves existence and regularity properties.

The second main result of the paper is to get the decay rate for the large
deviation of the return time to cylinder sets. We show it to be exponential with
a rate given by the Réenyi entropy function. Finally we also obtain
bounds for the free energy.

**Nicolas Vergne**

**Poisson approximation for search of rare words in DNA sequences**

Using recent results on the occurrence times of a string of symbols in a stochastic process with mixing properties, we present a new method for the search of rare words in biological sequences modelled by a Markov chain. We obtain a bound on the error between the distribution of the number of occurrences of a word in a sequence and its Poisson approximation. A global bound is already given by a Chen-Stein method. Our approach, the psi-mixing method, gives local bounds. Since we only need the error in the tails of distribution, the global uniform bound of Chen-Stein is too large and it is a better way to consider local bounds. It is the first time that local bounds are devised for Poisson approximation. We search for two thresholds on the number of occurrences from which we can regard a studied word as an over-represented or an under-represented one. A biological role is suggested for these

over- or under-represented words. Our method gives such thresholds for a panel of words much broader than the Chen-Stein method which cannot give any result in a great number of cases where our method works. Comparing the methods, we observe a better accuracy for the psi-mixing method for the bound of the tails of distribution. Our method can obviously be used in domains other than biology. We also present the software PANOW (available at

__http://stat.genopole.cnrs.fr/sg/software/panow/__)
dedicated to the computation of the error term and the thresholds for a studied
word.

**Benjamin Weiss**

**Recurrence in random fields - a survey**

Basic tools in studying recurrence in classical ergodic theory is the induced
transformation first introduced by S. Kakutani and Kac's formula for the average
return time for ergodic processes. As is well known for actions of Z^{d} when
d >1 there is no natural induced transformation. Nonetheless one can formulate a
substitute for Kac's formula and it turns out to have some nice applications. I
will discuss this as well as how a (non-canonical) induced transformation can be
defined.

**Roland Zweimüller (University
of Vienna)**

**Return times and hitting times in infinite ergodic theory**

I intend to give a survey talk for non-specialists, emphasizing the prominent role which return-time distributions play in the context of infinite measure preserving (null-recurrent) dynamical systems. After reviewing their importance in structure theory and for various probabilistic questions, I will mention what little we know about the asymptotics of hitting times to shrinking target sets.

*Last modified:
24-02-09
Maintained by Lucienne Coolen*