Early automated prediction model for the diagnosis and detection of children with autism spectrum disorders based on effective sociodemographic and family characteristic features
A. S. Albahri,Rula A. Hamid,A. A. Zaidan,O. S. Albahri
DOI: https://doi.org/10.1007/s00521-022-07822-0
2023-01-08
Neural Computing and Applications
Abstract:Children with autism spectrum disorders (ASDs) tremendously impact people's lives, and the incidence and prevalence of ASDs are increasing globally. Global health organisations and other autism-treatment centres specialising in autism diagnosis and detection are encountering challenges on how to provide an appropriate ASD diagnosis system that enables accurate analyses and early detection of autism. Information about ASD detection is affected by unknown aetiology of the disease, and an urgent solution is required to investigate its aetiological factors. Accordingly, increasing the opportunities to provide evidence of the 'sociodemographic and family characteristics' risk factors in predicting ASD is a scientific complex problem that needs to be solved. This study developed an early prediction model for diagnosing and detecting children with ASD based on effective sociodemographic and family characteristic features related to ASD using the machine learning (ML) model. The proposed methodology involves three phases. The identification phase is first accomplished by identifying a large-scale ASD dataset and preprocessing stages: 1-NN model for imputing missing data, feature-selection methods using Chi2 and Relief, and adaptive balancing data approach using Synthetic Minority Oversampling Technique. Chi2 and Relief are applied to determine the most effective sociodemographic and family characteristic features and produce a new balanced ASD dataset. The second development phase trains and tests the newly prepared ASD dataset through eight ML methods: decision tree, random forest, Naive Bayes, kNN, SVM, logistic regression, AdaBoost, and neural network multilayer perceptron (MLP). The developed model is evaluated in the third phase using five metrics: accuracy, precision, recall, F1 and AUROC, and test time in seconds. Results indicated the following: (1) Out of 10 highly effective sociodemographic and family characteristic features, seven related to autism cases are extracted. (2) Correlation sensitivity analysis reveals that the ' Mom_age_at_child_birth ' has the highest positive correlation with ' Father_age_at_child_birth ,' with an r -value of 0.751. Moreover, 'child_birth_month' and ' Birth_number ' have the highest negative correlation with ' Ses_points_1_10 ', with an r -value of (− 0.07). (3) AdaBoost, neural network, K-nearest neighbour, and decision tree methods show higher accuracy results (0.9995, 0.9925, 0.9834, and 0.9786, respectively), whereas random forest, logistic regression, and Naive Bayes methods show relatively lower accuracy (0.8297, 0.8199 and 0.8002, respectively). However, the support vector machine method shows the lowest accuracy (0.7105). AdaBoost obtained the highest accuracy on the basis of four other evaluation metrics (AUC = 0.9999, F 1 = 0.9995, precision = 0.9995 and recall = 0.9995). Accordingly, the new preprocessed and balanced ASD dataset can be utilised as a data source for autism research. The preprocessing stages can be considered correct and successfully perform better results than the original ASD dataset. Similar results from Chi2 and Relief in the feature-selection approaches substantially improved the classification accuracy. The study confirms the efficacy of the proposed prediction model compared with previous models in different comparative points. Early prediction of autism is possible through this proposed model.
computer science, artificial intelligence