Abstract:The convergence rate of a Markov chain to its stationary distribution is typically assessed using the concept of total variation mixing time. However, this worst-case measure often yields pessimistic estimates and is challenging to infer from observations. In this paper, we advocate for the use of the average-mixing time as a more optimistic and demonstrably easier-to-estimate alternative. We further illustrate its applicability across a range of settings, from two-point to countable spaces, and discuss some practical implications.
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the evaluation of the rate at which Markov chains converge to their stationary distributions. Specifically, the paper is concerned with how to use the average - mixing time as a more optimistic and easier - to - estimate alternative for evaluating the convergence speed of Markov chains, in order to overcome some limitations of the traditional total variation mixing time ($t_{\text{mix}}$).
### Background and Motivation
The traditional $t_{\text{mix}}$ is usually defined based on the worst - case initial distribution, which often leads to a pessimistic estimate of the convergence speed, and it is very difficult to infer this metric from observational data. Moreover, $t_{\text{mix}}$ is usually unknown, and even when upper bounds are given theoretically, these bounds are often conservative. Estimating $t_{\text{mix}}$ from the observational data of a single trajectory is a statistically very difficult problem, especially when the state space is large or infinite, and the required sample complexity is very high.
### Solution
The paper proposes to use the **average - mixing time** ($t^\sharp_{\text{mix}}$) as a more optimistic alternative for evaluating the convergence speed of Markov chains. Specifically:
1. **Definition and Properties**:
- The average - mixing time $t^\sharp_{\text{mix}}$ is defined as the minimum time required for a Markov chain starting from the stationary distribution to reach a given error threshold.
- Compared with $t_{\text{mix}}$, $t^\sharp_{\text{mix}}$ can reach convergence significantly faster, especially in small state spaces.
2. **Estimation Method**:
- The paper proposes a method for estimating $t^\sharp_{\text{mix}}$ from a single trajectory and proves that this method is statistically more efficient than estimating $t_{\text{mix}}$.
- Especially in the uniformly ergodic setting, the paper provides specific upper bounds on sample complexity, indicating that the average - mixing time can be estimated with a sub - linear number of samples.
3. **Theoretical Results**:
- The paper establishes the relationship between the average - mixing time and the relaxation time through spectral methods and geometric ergodicity assumptions.
- It further explores the connection between the average - mixing time and the β - mixing coefficients, proving that the average - mixing time can be used to control the deviation of functions on Markov chains.
### Practical Significance
- **Machine Learning and Statistical Inference**: The average - mixing time can be used to analyze machine learning algorithms for weakly - dependent data, providing more accurate generalization bounds and regret bounds.
- **Markov Chain Monte Carlo Methods**: In MCMC methods, the average - mixing time can be used as an effective tool for diagnosing convergence, especially when the state space is large or infinite.
In conclusion, by introducing and analyzing the average - mixing time, this paper provides a new and more practical framework for evaluating the convergence speed of Markov chains and solves several limitations in traditional methods.