Abstract:In the high-dimensional landscape, addressing the challenges of covariance regression with high-dimensional covariates has posed difficulties for conventional methodologies. This paper addresses these hurdles by presenting a novel approach for high-dimensional inference with covariance matrix outcomes. The proposed methodology is illustrated through its application in elucidating brain coactivation patterns observed in functional magnetic resonance imaging (fMRI) experiments and unraveling complex associations within anatomical connections between brain regions identified through diffusion tensor imaging (DTI). In the pursuit of dependable statistical inference, we introduce an integrative approach based on penalized estimation. This approach combines data splitting, variable selection, aggregation of low-dimensional estimators, and robust variance estimation. It enables the construction of reliable confidence intervals for covariate coefficients, supported by theoretical confidence levels under specified conditions, where asymptotic distributions are provided. Through various types of simulation studies, the proposed approach performs well for covariance regression in the presence of high-dimensional covariates. This innovative approach is applied to the Lifespan Human Connectome Project (HCP) Aging Study, which aims to uncover a typical aging trajectory and variations in the brain connectome among mature and older adults. The proposed approach effectively identifies brain networks and associated predictors of white matter integrity, aligning with established knowledge of the human brain.
What problem does this paper attempt to address?
This paper aims to address the challenges in high - dimensional covariance regression, especially when the covariate dimension is much larger than the sample size. Specifically, the paper proposes a new method to handle the regression problem with high - dimensional covariance matrices as outcomes. Traditional methods encounter difficulties in the face of high - dimensional data because when the number of covariates exceeds the sample size, the covariance matrix of covariates will have a rank - deficiency problem, causing traditional estimation methods to fail.
### Main problems solved in the paper:
1. **High - dimensional covariance regression**:
- When the dimension \( q \) of covariates is much larger than the sample size \( n \), traditional covariance regression methods are difficult to work effectively. The paper proposes a new method to solve this problem by introducing techniques such as penalized estimation and data splitting, enabling effective statistical inference in high - dimensional situations.
2. **Reliable statistical inference**:
- The paper not only focuses on parameter estimation but also proposes a framework for constructing reliable confidence intervals. This includes using techniques such as data splitting, variable selection, aggregation of low - dimensional estimates, and robust variance estimation to ensure reliable statistical inference results in high - dimensional situations.
3. **Practical applications**:
- The paper demonstrates the application of this method in practical problems, especially in neuroimaging. For example, through functional magnetic resonance imaging (fMRI) experiments, it studies the synchronous activation patterns between brain regions, and through diffusion tensor imaging (DTI) it identifies the anatomical connections between brain regions. In addition, this method has also found applications in financial data analysis for analyzing the synchronicity of stock prices.
### Main contributions:
1. **Penalized estimation and statistical inference**:
- It is the first attempt to conduct penalized estimation and statistical inference in the context of high - dimensional covariates, laying the foundation for dealing with complex high - dimensional data and covariance matrix problems.
2. **Aggregation of low - dimensional models**:
- By aggregating low - dimensional models, it solves the computational complexity problem of high - dimensional inference, improves computational efficiency, and makes high - dimensional data processing more efficient.
3. **Robust variance estimation**:
- A variance estimation method based on the infinitesimal jackknife method has been developed. It does not depend on parametric assumptions and can generate confidence intervals with correct coverage probabilities, ensuring the reliability of statistical inference.
### Method overview:
1. **Penalized estimation**:
- Use the entire data set to estimate the model parameters \( \beta \) and \( \gamma \) and regularize the high - dimensional coefficient \( \beta \).
2. **High - dimensional inference**:
- Adopt a data - splitting strategy, divide the sample into two subsets, one for dimension reduction and the other for low - dimensional model fitting. By randomly splitting multiple times and aggregating the results, the stability and reliability of the estimate are improved.
3. **Theoretical properties**:
- Prove the consistency of the estimators and the consistency of model selection under certain conditions, and derive the asymptotic properties of the estimators for a single split.
### Application examples:
- **Neuroimaging**: Applied to the Lifespan Human Connectome Project (HCP) Aging Study to study the changes in the brain connectome of mature and elderly people, and identify brain networks and their related predictors.
- **Financial data analysis**: Analyze the synchronicity of stock prices and explore the impact of relevant attributes at the company and market levels on stock synchronicity.
Through these methods and applications, the paper provides a comprehensive and effective solution to the high - dimensional covariance regression problem.