Some asymptotic behaviors for the plug-in estimator of entropy

Zhenhong Yu,Yu Miao
2024-09-28
Abstract:In the present paper, we consider the plug-in estimator of Shannon's entropy defined on a finite alphabet which is assumed to dynamically vary as the sample size increases. The asymptotic behaviors for the plug-in estimator, such as, asymptotic normality, Berry-Esseen bound and moderate deviation principle, are established.
Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the asymptotic behavior of the interpolation estimators of Shannon entropy on a dynamically changing finite alphabet. Specifically, the author focuses on the asymptotic normality, Berry - Esseen bound and moderate deviation principle of the interpolation estimators of Shannon entropy defined on a finite alphabet assumed to be dynamically changing as the sample size increases. ### Problem Background Shannon entropy \(H\) is defined as: \[ H = E(-\ln p(X)) = -\sum_{i = 1}^{K}p(i)\ln p(i) \] where \(X\) is a discrete random variable with an unknown distribution \(\{p(i), i\in X\}\), and \(X=\{i, 1\leq i\leq K\}\), \(K\) represents a finite integer or infinity. The interpolation estimator \(\hat{H}_n\) is defined as: \[ \hat{H}_n=-\sum_{i = 1}^{K}\hat{p}_n(i)\ln\hat{p}_n(i) \] where \(\hat{p}_n(i)=\frac{1}{n}\sum_{j = 1}^nI\{X_j = i\}\) is the empirical distribution induced by the sample \((X_1, X_2,\cdots, X_n)\). ### Main Research Contents 1. **Asymptotic Normality**: For a fixed and finite \(K\), Basharin [2] gave the central limit theorem: \[ \sqrt{n}\frac{\hat{H}_n - H}{\sigma}\xrightarrow{D}N(0, 1) \] where \(\sigma^2=\text{Var}(\ln p(X_1))> 0\). 2. **Dynamically Changing \(K(n)\)**: When \(K = K(n)\) changes as the sample size \(n\) increases, Paninski [10] proved that: \[ \sqrt{n}\frac{\hat{H}_n - H_n}{\sigma_n}\xrightarrow{D}N(0, 1) \] where \(H_n\) represents the Shannon entropy of the distribution \(\{p_n(i), 1\leq i\leq K(n), n\geq 1\}\), and \(\sigma_n^2=\text{Var}(\ln p_n(X_{1,n}))\). 3. **Infinite \(K\)**: Antos and Kontoyiannis [1] studied the convergence rate of \(\hat{H}_n\) under different tail conditions and pointed out that there is no universal convergence rate for any sequence estimator. ### Paper Contribution Based on the above work, this paper continues to study the asymptotic behavior of the interpolation estimators of Shannon entropy defined on a dynamically changing finite alphabet as the sample size increases, and establishes its asymptotic normality, Berry - Esseen bound and moderate deviation principle. ### Conclusion Through these studies, the author hopes to better understand the statistical properties of the interpolation estimators of Shannon entropy in complex situations (such as when the alphabet size changes with the sample size), so as to provide theoretical support for practical applications.