Abstract:Modern attempts in providing predictive risk for complex disorders, such as schizophrenia, integrate genetic and brain information in what is known as imaging genetics. In this work, we propose inferential and predictive methods to relate the presence of a complex disorder, schizophrenia, to genetic and imaging features and predict its risk for new individuals. Given functional Magnetic Resonance Image and Single Nucleotide Polymorphisms information of healthy and people diagnosed with schizophrenia, we use a Bayesian probit model to select discriminating variables, while to estimate the predictive risk, the most promising models are combined using a Bayesian model averaging scheme. For these purposes, we propose an informed reversible jump Markov chain Monte Carlo, named data driven or informed reversible jump, which is scalable to high-dimension data when the number of covariates is much larger than the sample size.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to predict the risk of schizophrenia by integrating genetic and brain imaging information. Specifically, the researchers proposed a method based on the Bayesian framework for selecting genetic features (single - nucleotide polymorphisms, SNPs) and brain functional regions (ROIs) related to schizophrenia, and using these features to predict an individual's risk of developing schizophrenia. The models and methods proposed in the paper aim to improve the accuracy of early detection of schizophrenia, thereby facilitating the earlier adoption of targeted treatment measures and potentially preventing or delaying the development of the disease. ### Research Background Schizophrenia is a complex multifactorial disease, and its etiology and pathophysiological mechanisms have not been fully elucidated. It is estimated that approximately 1% of the global population is affected by this disease, and common symptoms include hallucinations, delusions, cognitive dysfunction, disorganized thinking, and reduced movement, etc. Currently, the diagnosis of schizophrenia mainly relies on symptom observation, lacking effective medical detection methods. Therefore, developing new methods or tests to assist existing medical tools is of great public health significance. ### Research Methods The researchers used functional magnetic resonance imaging (fMRI) and single - nucleotide polymorphism (SNPs) data from healthy individuals and patients diagnosed with schizophrenia. They adopted a Bayesian Probit model to select discriminant variables and used Bayesian Model Averaging (BMA) to estimate the predicted risk. To achieve this goal, the researchers proposed a Data - Driven Reversible Jump Markov Chain Monte Carlo (DDRJ) algorithm, which can handle high - dimensional data even when the number of covariates is much larger than the sample size. ### Model Description Under the Bayesian framework, the researchers assumed a Probit model in which the unobservable latent variable \( Y^* \) follows a normal distribution: \[ Y_i^*=\beta_0 + \sum_{p \in G} \beta_p X_{ip}+\sum_{k \in M} \alpha_k Z_{ik}+\sum_{k \in M} \delta_k(1 - |Z_{ik}|)+\xi, \quad \xi \sim N(0, 1) \] where: - \( Y_i^* \) is the latent variable of the \( i \) - th individual. - \( \beta_0 \) is the intercept term. - \( \beta_p \) is the influence coefficient of the \( p \) - th ROI. - \( \alpha_k \) and \( \delta_k \) are the additive and dominant effects of the \( k \) - th SNP, respectively. - \( X_{ip} \) is the BOLD intensity of the \( i \) - th individual in the \( p \) - th ROI. - \( Z_{ik} \) is the genotype of the \( i \) - th individual in the \( k \) - th SNP, taking values of \(- 1,0,1\), corresponding to genotypes \( aa, aA, AA \) respectively. - \( \xi \) is the error term, following a standard normal distribution. ### DDRJ Algorithm The core of the DDRJ algorithm lies in efficiently proposing the next model to improve the efficiency of model selection. Specific steps include: 1. **Birth**: Select a variable highly correlated with the current model residuals from the remaining candidate variables and add it to the model. 2. **Death**: Select a variable to be removed from the model according to its importance in the current model (such as the coefficient size). ### Prediction Performance Evaluation The researchers used 5 - fold cross - validation to evaluate the prediction performance of the model, including Misclassification Error (MCE) and Area Under the ROC Curve (AUC). The results showed that DDRJ performed well in all scenarios, was able to accurately select all relevant variables, and was superior to random forests in prediction performance.

Bayesian variable selection using an informed reversible jump in imaging genetics: an application to schizophrenia

Joint Sparse Collaborative Regression on Imaging Genetics Study of Schizophrenia.

Bayesian variable selection in linear regression models with instrumental variables

Bayesian Nonparametric Variable Selection as an Exploratory Tool for Finding Genes that Matter

A Bayesian spatial model for imaging genetics

Bayesian variable selection using spike‐and‐slab priors with application to high dimensional electroencephalography data by local modelling

Multivariate Bayesian variable selection with application to multi-trait genetic fine mapping

Bayesian outcome selection modelling

Wavelet-domain regression and predictive inference in psychiatric neuroimaging

Cholesterol testing--time to change?

A generative-discriminative framework that integrates imaging, genetic, and diagnosis into coupled low dimensional space

Bayesian mixed model inference for genetic association under related samples with brain network phenotype

Sparse deep neural networks on imaging genetics for schizophrenia case–control classification

Bayesian Models of Functional Connectomics and Behavior

Bayesian Inference on Principal Component Analysis Using Reversible Jump Markov Chain Monte Carlo.

Probabilistic prediction of neurological disorders with a statistical assessment of neuroimaging data modalities

Scalable Bayesian variable selection for structured high‐dimensional data

Canonical Correlation Analysis of Imaging Genetics Data Based on Statistical Independence and Structural Sparsity

Prediction of functional outcomes of schizophrenia with genetic biomarkers using a bagging ensemble machine learning method with feature selection

A spatial-correlated multitask linear mixed-effects model for imaging genetics

Pancreatic cholesterol esterases. 3. Kinetic characterization of cholesterol ester resynthesis by the pancreatic cholesterol esterases.