Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

Alexandre Abraham,Michael Milham,Adriana Di Martino,R. Cameron Craddock,Dimitris Samaras,Bertrand Thirion,Gaël Varoquaux

DOI: https://doi.org/10.1016/j.neuroimage.2016.10.045

2016-11-18

Abstract:Resting-state functional Magnetic Resonance Imaging (R-fMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multi-faceted neuropatholo-gies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of inter-site classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multi-site autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These R-fMRI pipelines build participant-specific connectomes from functionally-defined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67% prediction accuracy on the full ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectome-based prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large R-fMRI datasets outperform reference atlases in the classification tasks.

Machine Learning,Neurons and Cognition

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to extract reproducible biomarkers using multi - center resting - state functional magnetic resonance imaging (R - fMRI) data, especially for autism spectrum disorder (ASD). Specifically, the researchers face the following challenges: 1. **Data heterogeneity**: Although large multi - center data sets increase the sample size, they also introduce uncontrolled heterogeneity, which poses new challenges in practical diagnostic applications. For example, different research centers may use different MRI acquisition protocols, participant instructions (such as open or closed eyes), recruitment strategies (such as age groups, IQ ranges, impairment levels, treatment histories, and acceptable comorbidities), etc. These differences will affect the extraction of biomarkers and the accuracy of diagnosis. 2. **Reproducibility and generalization ability of biomarkers**: Although previous studies have shown that R - fMRI can be used to identify biomarkers, the reproducibility and generalization ability of these methods in research or clinical settings are still controversial. The sample sizes of most R - fMRI studies are small, and the differences in data acquisition, image processing, and sampling strategies across studies have not been quantified. 3. **Robustness of the prediction model**: In order to evaluate the generalization ability of the model, researchers need to use unseen data for testing, that is, cross - validation. However, traditional cross - validation strategies usually do not consider potential site - specific confounding factors. Therefore, this study measures the performance of the model in the presence of uncontrolled variation by excluding the entire site, thereby more realistically simulating the situation in the clinical environment. 4. **Selection of the data processing flow**: Different steps in the functional connectivity data processing flow (such as brain region definition, time - series extraction, matrix estimation, and classification) also have a great impact on the results. The lack of ground truth of the functional architecture makes it difficult to validate the R - fMRI data processing flow. Therefore, researchers need to evaluate different data processing options to determine the optimal parameter - free processing flow. By solving the above problems, the researchers aim to prove the possibility of reliably learning cross - site biomarkers of mental states from multi - center heterogeneous data and provide an effective R - fMRI neuro - phenotypic extraction pipeline.

Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

Deriving and validating biomarkers associated with autism spectrum disorders from a large-scale resting-state database

Identification of autism spectrum disorder using multi-regional resting-state data through an attention learning approach

Attentional Connectivity-based Prediction of Autism Using Heterogeneous Rs-Fmri Data from CC200 Atlas

Identifying Autism from Resting-State fMRI Using Long Short-Term Memory Networks

Understanding the Role of Connectivity Dynamics of Resting-State Functional MRI in the Diagnosis of Autism Spectrum Disorder: A Comprehensive Study

ICA-based Resting-State Networks Obtained on Large Autism fMRI Dataset ABIDE

Uncovering Multi-Site Identifiability Based on Resting-State Functional Connectomes

Insights from an autism imaging biomarker challenge: Promises and threats to biomarker discovery

Multi-site clustering and nested feature extraction for identifying autism spectrum disorder with resting-state fMRI

Topological Properties of Resting-State fMRI Functional Networks Improve Machine Learning-Based Autism Classification

Enhancing Autism Spectrum Disorder identification in multi-site MRI imaging: A multi-head cross-attention and multi-context approach for addressing variability in un-harmonized data

Confounding Effects on the Performance of Machine Learning Analysis of Static Functional Connectivity Computed from rs-fMRI Multi-site Data

Improving Multi-Site Autism Classification Via Site-Dependence Minimization and Second-Order Functional Connectivity

Decoding autism: Uncovering patterns in brain connectivity through sparsity analysis with rs-fMRI data

Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks

Hybrid parcellation mapping approach for the extraction of connectivity measures in autism spectrum disorder fMRI data

Generalizability and reproducibility of functional connectivity in autism

Multi-site Diagnostic Classification of Autism Spectrum Disorder Using Adversarial Deep Learning on Resting-State Fmri.

Identifying Autism Spectrum Disorder From Resting-State fMRI Using Deep Belief Network

The Development of a Practical Artificial Intelligence Tool for Diagnosing and Evaluating Autism Spectrum Disorder: Multicenter Study