Abstract:Children with autism spectrum disorders (ASDs) tremendously impact people's lives, and the incidence and prevalence of ASDs are increasing globally. Global health organisations and other autism-treatment centres specialising in autism diagnosis and detection are encountering challenges on how to provide an appropriate ASD diagnosis system that enables accurate analyses and early detection of autism. Information about ASD detection is affected by unknown aetiology of the disease, and an urgent solution is required to investigate its aetiological factors. Accordingly, increasing the opportunities to provide evidence of the 'sociodemographic and family characteristics' risk factors in predicting ASD is a scientific complex problem that needs to be solved. This study developed an early prediction model for diagnosing and detecting children with ASD based on effective sociodemographic and family characteristic features related to ASD using the machine learning (ML) model. The proposed methodology involves three phases. The identification phase is first accomplished by identifying a large-scale ASD dataset and preprocessing stages: 1-NN model for imputing missing data, feature-selection methods using Chi2 and Relief, and adaptive balancing data approach using Synthetic Minority Oversampling Technique. Chi2 and Relief are applied to determine the most effective sociodemographic and family characteristic features and produce a new balanced ASD dataset. The second development phase trains and tests the newly prepared ASD dataset through eight ML methods: decision tree, random forest, Naive Bayes, kNN, SVM, logistic regression, AdaBoost, and neural network multilayer perceptron (MLP). The developed model is evaluated in the third phase using five metrics: accuracy, precision, recall, F1 and AUROC, and test time in seconds. Results indicated the following: (1) Out of 10 highly effective sociodemographic and family characteristic features, seven related to autism cases are extracted. (2) Correlation sensitivity analysis reveals that the ' Mom_age_at_child_birth ' has the highest positive correlation with ' Father_age_at_child_birth ,' with an r -value of 0.751. Moreover, 'child_birth_month' and ' Birth_number ' have the highest negative correlation with ' Ses_points_1_10 ', with an r -value of (− 0.07). (3) AdaBoost, neural network, K-nearest neighbour, and decision tree methods show higher accuracy results (0.9995, 0.9925, 0.9834, and 0.9786, respectively), whereas random forest, logistic regression, and Naive Bayes methods show relatively lower accuracy (0.8297, 0.8199 and 0.8002, respectively). However, the support vector machine method shows the lowest accuracy (0.7105). AdaBoost obtained the highest accuracy on the basis of four other evaluation metrics (AUC = 0.9999, F 1 = 0.9995, precision = 0.9995 and recall = 0.9995). Accordingly, the new preprocessed and balanced ASD dataset can be utilised as a data source for autism research. The preprocessing stages can be considered correct and successfully perform better results than the original ASD dataset. Similar results from Chi2 and Relief in the feature-selection approaches substantially improved the classification accuracy. The study confirms the efficacy of the proposed prediction model compared with previous models in different comparative points. Early prediction of autism is possible through this proposed model.

Use of Machine Learning Models to Differentiate Neurodevelopment Conditions Through Digitally Collected Data: Cross-Sectional Questionnaire Study

Use of machine learning on clinical questionnaires data to support the diagnostic classification of Attention DeficitHyperactivity Disorder: a personalized medicine approach

Digitally Diagnosing Multiple Developmental Delays Using Crowdsourcing Fused With Machine Learning: Protocol for a Human-in-the-Loop Machine Learning Study

Predicting neurodevelopmental disorders using machine learning models and electronic health records – status of the field

Toward Digital Phenotypes of Early Childhood Mental Health via Unsupervised and Supervised Machine Learning

Digitally Diagnosing Multiple Developmental Delays using Crowdsourcing fused with Machine Learning: A Research Protocol

Machine Learning for Differential Diagnosis Between Clinical Conditions With Social Difficulty: Autism Spectrum Disorder, Early Psychosis, and Social Anxiety Disorder

Multiclass Classification of Autism Spectrum Disorder, Attention Deficit Hyperactivity Disorder, and Typically Developed Individuals Using fMRI Functional Connectivity Analysis

Using Machine Learning for Motion Analysis to Early Detect Autism Spectrum Disorder: A Systematic Review

Early identification of autism spectrum disorder by multi-instrument fusion: A clinically applicable machine learning approach

A personalized classification of behavioral severity of autism spectrum disorder using a comprehensive machine learning framework

A data driven machine learning approach to differentiate between autism spectrum disorder and attention-deficit/hyperactivity disorder based on the best-practice diagnostic instruments for autism

A systematic review on the application of machine learning models in psychometric questionnaires for the diagnosis of attention deficit hyperactivity disorder

Machine Learning Prediction of Autism Spectrum Disorder From a Minimal Set of Medical and Background Information

Early automated prediction model for the diagnosis and detection of children with autism spectrum disorders based on effective sociodemographic and family characteristic features

Identifying the neurodevelopmental and psychiatric signatures of genomic disorders associated with intellectual disability: a machine learning approach

A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning

Reliable Autism Spectrum Disorder Diagnosis for Pediatrics Using Machine Learning and Explainable AI

Predicting individual cases of major adolescent psychiatric conditions with artificial intelligence

Annual Research Review: Translational machine learning for child and adolescent psychiatry

Prediction Models of Functional Outcomes for Individuals in the Clinical High-Risk State for Psychosis or With Recent-Onset Depression: A Multimodal, Multisite Machine Learning Analysis