Arun G. Chandrasekhar,Matthew O. Jackson,Tyler H. McCormick,Vydhourie Thiyageswaran
Abstract:We present a general central limit theorem with simple, easy-to-check covariance-based sufficient conditions for triangular arrays of random vectors when all variables could be interdependent. The result is constructed from Stein's method, but the conditions are distinct from related work. We show that these covariance conditions nest standard assumptions studied in the literature such as $M$-dependence, mixing random fields, non-mixing autoregressive processes, and dependency graphs, which themselves need not imply each other. This permits researchers to work with high-level but intuitive conditions based on overall correlation instead of more complicated and restrictive conditions such as strong mixing in random fields that may not have any obvious micro-foundation. As examples of the implications, we show how the theorem implies asymptotic normality in estimating: treatment effects with spillovers in more settings than previously admitted, covariance matrices, processes with global dependencies such as epidemic spread and information diffusion, and spatial process with Matérn dependencies.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when dealing with interdependent random vectors, how to provide a general and easily verifiable Central Limit Theorem (CLT) condition. Specifically, the authors propose a covariance - based method to ensure that even when there may be interdependencies among all variables, the appropriate normalization of the sample mean can still be asymptotically normally distributed.
### Main problems and background
1. **Limitations of existing methods**:
- Existing CLT conditions usually require checking assumptions specific to certain data structures. These assumptions may be difficult to verify scientifically and often impose strict structural limitations (for example, strong mixing conditions), or allow more flexibility but limit the range of correlations (for example, sparse dependence graphs).
- These conditions usually lack a solid scientific basis. Especially in practical applications, such as the correlation structures in scenarios like agricultural shocks, infectious disease spread, and information diffusion, it is very difficult to match these assumptions.
2. **Research objectives**:
- Propose a new, more general CLT condition based on the overall correlation rather than the specific dependence structure.
- Enable researchers to verify these conditions by directly considering the correlation, so that the results are more operable and applicable to a wider range of researchers.
### Main contributions of the paper
1. **Simple and easy - to - use conditions**:
- Propose three high - level and easy - to - interpret conditions based on covariance, allowing researchers to intuitively understand and verify these conditions.
2. **Wide applicability**:
- The new method is applicable to multiple dependence structures, including but not limited to M - dependence, mixed random fields, non - mixed autoregressive processes, and dependence graphs, which do not necessarily imply each other.
3. **Extended application range**:
- Covers practical application scenarios not covered in previous literature, such as treatment - effect models with spillover effects, global dependence processes (such as infectious disease spread and information diffusion), spatial processes (such as Matérn dependence matrices), etc.
4. **Combination of theory and practice**:
- Provides detailed mathematical derivations and case analyses, demonstrating the effectiveness of the new method in different application scenarios.
### Summary of mathematical expressions
To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper:
1. **Total covariance**:
\[
\Omega_n := \sum_{i = 1}^n\sum_{j\in A_n(i)}\text{cov}(Z_i, Z_j)
\]
where \(A_n(i)\) is the affinity set of the \(i\)-th random variable.
2. **Condition 1: Covariance control within the affinity set**:
\[
\sum_{i}\sum_{j,k\in A_n(i)}E[|Z_i|Z_jZ_k]=o((\|\Omega_n\|_F)^{3/2})
\]
3. **Condition 2: Covariance control across affinity sets**:
\[
\sum_{i,j}\sum_{k\in A_n(i),l\in A_n(j)}\text{cov}(Z_iZ_k,Z_jZ_l)=o((\|\Omega_n\|_F)^2)
\]
4. **Condition 3: Covariance control outside the affinity set**:
\[
\sum_{i}E(|Z^{-i}E(Z_i|Z^{-i})|)=o(\|\Omega_n\|_F)
\]
These conditions ensure that the asymptotic normality of the sample mean still holds even under highly complex dependence structures.