A distribution weighting a set of laws whose initial states are grouped into classes

Servet Martinez
DOI: https://doi.org/10.48550/arXiv.1311.4850
2013-11-20
Abstract:Let $I$ be a finite alphabet and $\aS\subset I$ be a nonempty strict subset. The sequences in $I^\ZZ$ are organized into connected regions which always start with a symbol in $\aS$. The regions are labelled by types $C(s)$, thus a region starting at $s'\in C(s)$ has the same type as one starting at $s$. Let $(\aP_s: s\in \aS)$ be a family of distributions on $I^\NN$ where each $\aP_s$ charges sequences starting with the symbol $s$. We can define a natural distribution $\PP$ on $I^\NN$, that counts the number of visits to the states from $\aP_s$, properly weighted. A dynamics of interest is such that at the first occurrence of $s'\in \aS\setminus C(s)$ the law regenerates with distribution $\aP_{s'}$. In this case we are able to find simple conditions for $\PP$ to be stationary. In addition, we study the following more complex model: once a symbol $s'\in \aS\setminus C(s)$ has been encountered, there is a decision to be made, either a new region of type $C(s')$ governed by $\aP_{s'}$ starts or the region continues to be a $C(s)$ region. This decision is modeled as random and depends on $s'$. In this setting a similar distribution to $\PP$ can be constructed and the conditions for stationarity are supplied. These models are inspired by genomic sequences where $I$ is the set of codons, the classes $(C(s): s\in \aS)$ group codons defining similar genomic classes, e.g. in bacteria there are two classes corresponding to the start and stop codons, and the random decision to continue a region or to begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.
Probability
What problem does this paper attempt to address?