Abstract:Using machine-learning tools to predict individual phenotypes from neuroimaging data is one of the most promising and hence dynamic fields in systems neuroscience. Here, we perform a literature survey of the rapidly work on phenotype prediction in healthy subjects or general population to sketch out the current state and ongoing developments in terms of data, analysis methods and reporting. Excluding papers on age-prediction and clinical applications, which form a distinct literature, we identified a total 108 papers published since 2007. In these, memory, fluid intelligence and attention were most common phenotypes to be predicted, which resonates with the observation that roughly a quarter of the papers used data from the Human Connectome Project, even though another half recruited their own cohort. Sample size (in terms of training and external test sets) and prediction accuracy (from internal and external validation respectively) did not show significant temporal trends. Prediction accuracy was negatively correlated with sample size of the training set, but not the external test set. While known to be optimistic, leave-one-out cross-validation (LOO CV) was the prevalent strategy for model validation (n = 48). Meanwhile, 27 studies used external validation with external test set. Both numbers showed no significant temporal trends. The most popular learning algorithm was connectome-based predictive modeling introduced by the Yale team. Other common learning algorithms were linear regression, relevance vector regression (RVR), support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), and elastic net. Meanwhile, the amount of data from self-recruiting studies (but not studies using open, shared dataset) was positively correlated with internal validation prediction accuracy. At the same time, self-recruiting studies also reported a significantly higher internal validation prediction accuracy than those using open, shared datasets. Data type and participant age did not significantly influence prediction accuracy. Confound control also did not influence prediction accuracy after adjusted for other factors. To conclude, most of the current literature is probably quite optimistic with internal validation using LOO CV. More efforts should be made to encourage the use of external validation with external test sets to further improve generalizability of the models.

Reporting details of neuroimaging studies on individual traits prediction: A literature survey

Brain-phenotype predictions can survive across diverse real-world data

Individual characteristics outperform resting-state fMRI for the prediction of behavioral phenotypes

Brain-phenotype predictions of language and executive function can survive across diverse real-world data: Dataset shifts in developmental populations

Reliability and predictability of phenotype information from functional connectivity in large imaging datasets

Inferred vs traditional personality assessment: are we predicting the same thing?

Beyond functional connectivity: deep learning applied to resting-state fMRI time series in the prediction of 58 human traits in the HCP

Interpreting Brain Biomarkers: Challenges and solutions in interpreting machine learning-based predictive neuroimaging

Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls

Predicting sex, age, general cognition and mental health with machine learning on brain structural connectomes

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

Predicting neurodevelopmental disorders using machine learning models and electronic health records – status of the field

Clinical Prediction from Structural Brain MRI Scans: A Large-Scale Empirical Study

Computational limits to the legibility of the imaged human brain

The Burden of Reliability: How Measurement Noise Limits Brain-Behaviour Predictions

Evaluation of behavioral variance/covariance explained by the neuroimaging data through a pattern‐based regression

Individual cognitive traits can be predicted from task-based dynamic functional connectivity with a deep convolutional-recurrent model

Choosing explanation over performance: Insights from machine learning-based prediction of human intelligence from brain connectivity

An automated machine learning approach to predict brain age from cortical anatomical measures

Latent Similarity Identifies Important Functional Connections for Phenotype Prediction

Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors