Abstract:IMPORTANCE Neuroimaging-based artificial intelligence (AI) diagnostic models have proliferated in psychiatry. However, their clinical applicability and reporting quality (ie, feasibility) for clinical practice have not been systematically evaluated. OBJECTIVE To systematically assess the risk of bias (ROB) and reporting quality of neuroimaging-based AI models for psychiatric diagnosis. EVIDENCE REVIEW PubMed was searched for peer-reviewed, full-length articles published between January 1, 1990, and March 16, 2022. Studies aimed at developing or validating neuroimaging-based AI models for clinical diagnosis of psychiatric disorders were included. Reference lists were further searched for suitable original studies. Data extraction followed the CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies) and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. A closed-loop cross-sequential design was used for quality control. The PROBAST (Prediction Model Risk of Bias Assessment Tool) and modified CLEAR (Checklist for Evaluation of Image-Based Artificial Intelligence Reports) benchmarks were used to systematically evaluate ROB and reporting quality. FINDINGS A total of 517 studies presenting 555 AI models were included and evaluated. Of these models, 461 (83.1%; 95% CI, 80.0%-86.2%) were rated as having a high overall ROB based on the PROBAST. The ROB was particular high in the analysis domain, including inadequate sample size (398 of 555 models [71.7%; 95% CI, 68.0%-75.6%]), poor model performance examination (with 100% of models lacking calibration examination), and lack of handling data complexity (550 of 555 models [99.1%; 95% CI, 98.3%-99.9%]). None of the AI models was perceived to be applicable to clinical practices. Overall reporting completeness (ie, number of reported items/number of total items) for the AI models was 61.2% (95% CI, 60.6%-61.8%), and the completeness was poorest for the technical assessment domain with 39.9% (95% CI, 38.8%-41.1%). CONCLUSIONS AND RELEVANCE This systematic review found that the clinical applicability and feasibility of neuroimaging-based AI models for psychiatric diagnosis were challenged by a high ROB and poor reporting quality. Particularly in the analysis domain, ROB in AI diagnostic models should be addressed before clinical application.

Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark

RoBIn: A Transformer-Based Model For Risk Of Bias Inference With Machine Reading Comprehension

RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials

Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials

Automating risk of bias assessment for clinical trials

Towards the automatic risk of bias assessment on randomized controlled trials: A comparison of RobotReviewer and humans

A case study of the informative value of risk of bias and reporting quality assessments for systematic reviews

[Introduction of a Tool to Assess Risk of Bias in Non-randomized Studies-of Environmental Exposure (ROBINS-E)].

ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions

A tool to assess risk of bias in non-randomized follow-up studies of exposure effects (ROBINS-E)

Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2

Common challenges and suggestions for risk of bias tool development: a systematic review of methodological studies

Risk of Bias Assessment: (8) Risk of Bias in Systematic Review (ROBIS)

Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system

Zero- and few-shot prompting of generative large language models provides weak assessment of risk of bias in clinical trials

[Introduction of a Tool to Assess Risk of Bias in Non-randomized Studies-of Exposure (2022)].

CLIMB: A Benchmark of Clinical Bias in Large Language Models

Reducing Biases towards Minoritized Populations in Medical Curricular Content via Artificial Intelligence for Fairer Health Outcomes

SYRCLE’s risk of bias tool for animal studies

Evaluation of Risk of Bias in Neuroimaging-Based Artificial Intelligence Models for Psychiatric Diagnosis: A Systematic Review

Use and Reporting of Risk of Bias Tools in 825 Systematic Reviews of Acupuncture: a Cross-Sectional Study.