Abstract:Children with autism spectrum disorders (ASDs) tremendously impact people's lives, and the incidence and prevalence of ASDs are increasing globally. Global health organisations and other autism-treatment centres specialising in autism diagnosis and detection are encountering challenges on how to provide an appropriate ASD diagnosis system that enables accurate analyses and early detection of autism. Information about ASD detection is affected by unknown aetiology of the disease, and an urgent solution is required to investigate its aetiological factors. Accordingly, increasing the opportunities to provide evidence of the 'sociodemographic and family characteristics' risk factors in predicting ASD is a scientific complex problem that needs to be solved. This study developed an early prediction model for diagnosing and detecting children with ASD based on effective sociodemographic and family characteristic features related to ASD using the machine learning (ML) model. The proposed methodology involves three phases. The identification phase is first accomplished by identifying a large-scale ASD dataset and preprocessing stages: 1-NN model for imputing missing data, feature-selection methods using Chi2 and Relief, and adaptive balancing data approach using Synthetic Minority Oversampling Technique. Chi2 and Relief are applied to determine the most effective sociodemographic and family characteristic features and produce a new balanced ASD dataset. The second development phase trains and tests the newly prepared ASD dataset through eight ML methods: decision tree, random forest, Naive Bayes, kNN, SVM, logistic regression, AdaBoost, and neural network multilayer perceptron (MLP). The developed model is evaluated in the third phase using five metrics: accuracy, precision, recall, F1 and AUROC, and test time in seconds. Results indicated the following: (1) Out of 10 highly effective sociodemographic and family characteristic features, seven related to autism cases are extracted. (2) Correlation sensitivity analysis reveals that the ' Mom_age_at_child_birth ' has the highest positive correlation with ' Father_age_at_child_birth ,' with an r -value of 0.751. Moreover, 'child_birth_month' and ' Birth_number ' have the highest negative correlation with ' Ses_points_1_10 ', with an r -value of (− 0.07). (3) AdaBoost, neural network, K-nearest neighbour, and decision tree methods show higher accuracy results (0.9995, 0.9925, 0.9834, and 0.9786, respectively), whereas random forest, logistic regression, and Naive Bayes methods show relatively lower accuracy (0.8297, 0.8199 and 0.8002, respectively). However, the support vector machine method shows the lowest accuracy (0.7105). AdaBoost obtained the highest accuracy on the basis of four other evaluation metrics (AUC = 0.9999, F 1 = 0.9995, precision = 0.9995 and recall = 0.9995). Accordingly, the new preprocessed and balanced ASD dataset can be utilised as a data source for autism research. The preprocessing stages can be considered correct and successfully perform better results than the original ASD dataset. Similar results from Chi2 and Relief in the feature-selection approaches substantially improved the classification accuracy. The study confirms the efficacy of the proposed prediction model compared with previous models in different comparative points. Early prediction of autism is possible through this proposed model.

A conditional multi-label model to improve prediction of a rare outcome: An illustration predicting autism diagnosis

Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning

A deep learning predictive classifier for autism screening and diagnosis

A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning

Predicting Autism Spectrum Disorder Using Maternal Risk Factors: A Multi-Center Machine Learning Study

Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes

Machine Learning Prediction of Autism Spectrum Disorder Through Linking Mothers’ and Children’s Electronic Health Record Data

Early identification of autism spectrum disorder by multi-instrument fusion: A clinically applicable machine learning approach

Predicting Autism Spectrum Disorder: Transformer-Based Deep Learning Ensemble Framework Using Health Administrative & Birth Registry Data

A Clustered Optimal ROC Curve Method for Family-Based Genetic Risk Prediction

Machine Learning Prediction of Autism Spectrum Disorder From a Minimal Set of Medical and Background Information

Predicting autism traits from baby wellness records: A machine learning approach

Predicting neurodevelopmental disorders using machine learning models and electronic health records – status of the field

SCOPE: predicting future diagnoses in office visits using electronic health records

Early automated prediction model for the diagnosis and detection of children with autism spectrum disorders based on effective sociodemographic and family characteristic features

Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder

Comprehensive exploration of multi-modal and multi-branch imaging markers for autism diagnosis and interpretation: insights from an advanced deep learning model

Harnessing the power of child development records to detect early neurodevelopmental disorders using Bayesian analysis

Multimodal neuroimaging-based prediction of adult outcomes in childhood-onset ADHD using ensemble learning techniques

Reliable Autism Spectrum Disorder Diagnosis for Pediatrics Using Machine Learning and Explainable AI

Predicting Risk of Alzheimer’s Diseases and Related Dementias with AI Foundation Model on Electronic Health Records