Abstract:Background Early diagnosis and access to resources, support and therapy are critical for improving long-term outcomes for children with autism spectrum disorder (ASD). ASD is typically detected using a case-finding approach based on symptoms and family history, resulting in many delayed or missed diagnoses. While population-based screening would be ideal for early identification, available screening tools have limited accuracy. This study aims to determine whether machine learning models applied to health administrative and birth registry data can identify young children (aged 18 months to 5 years) who are at increased likelihood of developing ASD. Methods We assembled the study cohort using individually linked maternal-newborn data from the Better Outcomes Registry and Network (BORN) Ontario database. The cohort included all live births in Ontario, Canada between April 1st, 2006, and March 31st, 2018, linked to datasets from Newborn Screening Ontario (NSO), Prenatal Screening Ontario (PSO), and Canadian Institute for Health Information (CIHI) (Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS)). The NSO and PSO datasets provided screening biomarker values and outcomes, while DAD and NACRS contained diagnosis codes and intervention codes for mothers and offspring. Extreme Gradient Boosting models and large-scale ensembled Transformer deep learning models were developed to predict ASD diagnosis between 18 and 60 months of age. Leveraging explainable artificial intelligence methods, we determined the impactful factors that contribute to increased likelihood of ASD at both an individual- and population-level. Results The final study cohort included 703,894 mother-offspring pairs, with 10,964 identified cases of ASD. The best-performing ensemble of Transformer models achieved an area under the receiver operating characteristic curve of 69.6% for predicting ASD diagnosis, a sensitivity of 70.9%, a specificity of 56.9%. We determine that our model can be used to identify an enriched pool of children with the greatest likelihood of developing ASD, demonstrating the feasibility of this approach. Conclusions This study highlights the feasibility of employing machine learning models and routinely collected health data to systematically identify young children at high likelihood of developing ASD. Ensemble transformer models applied to health administrative and birth registry data offer a promising avenue for universal ASD screening. Such early detection enables targeted and formal assessment for timely diagnosis and early access to resources, support, or therapy.

Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning

A deep learning predictive classifier for autism screening and diagnosis

Early identification of autism spectrum disorder by multi-instrument fusion: A clinically applicable machine learning approach

Single Nucleotide Polymorphisms Predict Symptom Severity of Autism Spectrum Disorder

Discovery and validation of novel genes in a large Chinese ASD cohort

Comprehensive exploration of multi-modal and multi-branch imaging markers for autism diagnosis and interpretation: insights from an advanced deep learning model

Identification of De Novo Mutations in the Chinese ASD Cohort Via Whole-Exome Sequencing Unveils Brain Regions Implicated in Autism

Artificial intelligence and bioinformatics analyze markers of children's transcriptional genome to predict autism spectrum disorder

Targeted Resequencing of 358 Candidate Genes for Autism Spectrum Disorder in a Chinese Cohort Reveals Diagnostic Potential and Genotype-Phenotype Correlations

Application of deep learning algorithm on whole genome sequencing data uncovers structural variants associated with multiple mental disorders in African American patients

Common risk variants identified in autism spectrum disorder

Unraveling the immunogenetic landscape of autism spectrum disorder: a comprehensive bioinformatics approach

DeepASDPred: a CNN-LSTM-based deep learning method for Autism spectrum disorders risk RNA identification

Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data

Identification of Novel Diagnostic Neuroimaging Biomarkers for Autism Spectrum Disorder Through Convolutional Neural Network-Based Analysis of Functional, Structural, and Diffusion Tensor Imaging Data Towards Enhanced Autism Diagnosis

Targeted Sequencing and Clinical Strategies in Children with Autism Spectrum Disorder: A Cohort Study

Predicting Autism Spectrum Disorder Using Maternal Risk Factors: A Multi-Center Machine Learning Study

Discovering the gene-brain-behavior link in autism via generative machine learning

Early identification of autism spectrum disorder based on machine learning with eye-tracking data

Diagnosing autism spectrum disorder in children using conventional MRI and apparent diffusion coefficient based deep learning algorithms

A deep learning model for prediction of autism status using whole-exome sequencing data