High Dimensional Bayesian Network Classification with Network Global-Local Shrinkage Priors

Sharmistha Guha,Abel Rodriguez
DOI: https://doi.org/10.48550/arXiv.2009.11401
2020-09-24
Abstract:This article proposes a novel Bayesian classification framework for networks with labeled nodes. While literature on statistical modeling of network data typically involves analysis of a single network, the recent emergence of complex data in several biological applications, including brain imaging studies, presents a need to devise a network classifier for subjects. This article considers an application from a brain connectome study, where the overarching goal is to classify subjects into two separate groups based on their brain network data, along with identifying influential regions of interest (ROIs) (referred to as nodes). Existing approaches either treat all edge weights as a long vector or summarize the network information with a few summary measures. Both these approaches ignore the full network structure, may lead to less desirable inference in small samples and are not designed to identify significant network nodes. We propose a novel binary logistic regression framework with the network as the predictor and a binary response, the network predictor coefficient being modeled using a novel class global-local shrinkage priors. The framework is able to accurately detect nodes and edges in the network influencing the classification. Our framework is implemented using an efficient Markov Chain Monte Carlo algorithm. Theoretically, we show asymptotically optimal classification for the proposed framework when the number of network edges grows faster than the sample size. The framework is empirically validated by extensive simulation studies and analysis of a brain connectome data.
Methodology,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform classification in high - dimensional network data and simultaneously identify the nodes and edges that have a significant impact on the classification results. Specifically, the paper focuses on how to effectively classify samples (such as individuals) into two different groups (for example, classify according to high or low IQ) given network data with labeled nodes (such as brain connectome data), and be able to identify which brain regions (nodes) and their connections (edges) have an important impact on the classification results. The paper proposes a new Bayesian classification framework, which utilizes network global - local shrinkage priors. Through this method, the paper aims to overcome several limitations in existing methods: 1. **Limitations of existing methods**: - Existing methods usually regard all edge weights as a long vector or use a few network summary indicators to summarize network information. These methods ignore the complete network structure, which may lead to poor inference in small - sample cases and are unable to design methods for identifying important network nodes. - These methods perform poorly when dealing with complex data (such as data in brain imaging studies) because they do not fully utilize the structural features of network data. 2. **Solutions proposed in the paper**: - The paper proposes a binary logistic regression framework, in which the network is used as a predictor variable and the binary response variable represents the probability that a sample belongs to a certain category. - A new class of network global - local shrinkage priors is used. This prior can introduce low - rank and near - sparse structures into the model, thereby better capturing the interaction effects and residual effects in the network. - The posterior calculation of the model is implemented through an efficient Markov chain Monte Carlo (MCMC) algorithm. 3. **Theoretical contributions**: - The paper also theoretically proves that the proposed framework has asymptotically optimal classification performance under specific conditions, especially when the number of network edges grows at a super - linear rate. - It provides insights into how the sparsity of the number of network nodes or the true network prediction coefficients changes with the sample size to achieve asymptotically optimal classification. In summary, by proposing a new Bayesian classification framework, this paper not only improves the accuracy of classification but also can identify network nodes and edges that have a significant impact on the classification results, which is of great significance for understanding complex network data (such as brain connectome data).