From Letters to Words and Back: Invertible Coding of Stationary Measures

Łukasz Dębowski
2024-10-04
Abstract:Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise, for instance, in the problem of successive recurrence times. In particular, we show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959.
Probability
What problem does this paper attempt to address?