Boxin Zhao,Percy S. Zhai,Y. Samuel Wang,Mladen Kolar
Abstract:Undirected graphical models are widely used to model the conditional independence structure of vector-valued data. However, in many modern applications, for example those involving EEG and fMRI data, observations are more appropriately modeled as multivariate random functions rather than vectors. Functional graphical models have been proposed to model the conditional independence structure of such functional data. We propose a neighborhood selection approach to estimate the structure of Gaussian functional graphical models, where we first estimate the neighborhood of each node via a function-on-function regression and subsequently recover the entire graph structure by combining the estimated neighborhoods. Our approach only requires assumptions on the conditional distributions of random functions, and we estimate the conditional independence structure directly. We thus circumvent the need for a well-defined precision operator that may not exist when the functions are infinite dimensional. Additionally, the neighborhood selection approach is computationally efficient and can be easily parallelized. The statistical consistency of the proposed method in the high-dimensional setting is supported by both theory and experimental results. In addition, we study the effect of the choice of the function basis used for dimensionality reduction in an intermediate step. We give a heuristic criterion for choosing a function basis and motivate two practically useful choices, which we justify by both theory and experiments.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to estimate the conditional independence structure of multivariate random functions in high - dimensional data. Specifically, the authors propose a neighborhood - selection - based method to estimate the structure of Gaussian Functional Graphical Models (GFGMs). Unlike traditional graphical models, these models deal with multivariate random functions rather than vector - valued data. Therefore, the main contribution of the paper lies in providing a new method to directly estimate the conditional independence structure without the need to define a precision operator which may be ill - defined in the infinite - dimensional case.
### Background and Motivation of the Paper
1. **Background**:
- Undirected graphical models are widely used to model the conditional independence structure of vector - valued data.
- However, in many modern applications, such as electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data, the observations are more suitably modeled as multivariate random functions rather than vectors.
- Functional Graphical Models (FGMs) are proposed to model the conditional independence structure of such functional data.
2. **Motivation**:
- The authors focus on the problem of estimating the conditional independence structure of multivariate random functions.
- Traditional graphical model methods (such as functional graphical model lasso, fglasso) need to assume that the data is finite - dimensional and rely on the definition of the precision operator, which is not applicable in the infinite - dimensional case.
- Therefore, the authors propose a new neighborhood - selection method that can directly estimate the conditional independence structure without the need to define the precision operator.
### Main Contributions of the Paper
1. **Method**:
- A neighborhood - selection - based method is proposed to estimate the structure of Gaussian Functional Graphical Models.
- The neighborhood of each node is estimated through function - on - function regression, and then these estimated neighborhoods are combined to recover the entire graph structure.
- This method only needs to make assumptions about the conditional distribution of random functions and directly estimates the conditional independence structure, thus avoiding the problem of defining the precision operator.
2. **Theoretical Contributions**:
- Non - asymptotic theoretical guarantees are provided, and the graph recovery error bounds in the finite - sample case are derived.
- The residual term when using the finite - dimensional approximation is analyzed to ensure the theoretical validity of the method.
3. **Practical Applications**:
- This method has high computational efficiency in high - dimensional settings and can be processed in parallel.
- It is experimentally proven that this method is effective on actual data sets, especially on fMRI data sets, and can reveal the functional connection patterns between ADHD patients and the control group.
### Method Overview
1. **Functional Graphical Model**:
- An undirected graph \( G=(V, E) \) is defined, where \( V \) is the set of nodes and \( E \) is the set of edges.
- The edge set \( E \) encodes the pairwise Markov property of the multivariate random function \( g \).
2. **Neighborhood Selection**:
- For each node \( j \), its neighborhood \( N_j \) is estimated through function - on - function regression.
- Specifically, for the target node \( j \), the observed function is represented as a finite - dimensional vector through projection basis functions, and then the regression coefficient matrix \( B_{k,M}^* \) is estimated through vector - on - vector regression.
- Finally, the neighborhood \( \hat{N}_j \) is determined through the estimated regression coefficient matrix.
3. **Optimization Algorithm**:
- The least - squares method with group lasso penalty is used to solve the regression problem.
- The accuracy of the estimate is improved by choosing appropriate basis functions.
### Conclusion
This paper proposes a new neighborhood - selection method for estimating the structure of Gaussian Functional Graphical Models. This method not only has strong theoretical guarantees but also shows good performance in practical applications, especially when dealing with high - dimensional functional data.