Use of Machine Learning Models to Differentiate Neurodevelopment Conditions Through Digitally Collected Data: Cross-Sectional Questionnaire Study

Silvia Grazioli,Alessandro Crippa,Noemi Buo,Silvia Busti Ceccarelli,Massimo Molteni,Maria Nobile,Antonio Salandi,Sara Trabattoni,Gabriele Caselli,Paola Colombo
DOI: https://doi.org/10.2196/54577
2024-07-29
Abstract:Background: Diagnosis of child and adolescent psychopathologies involves a multifaceted approach, integrating clinical observations, behavioral assessments, medical history, cognitive testing, and familial context information. Digital technologies, especially internet-based platforms for administering caregiver-rated questionnaires, are increasingly used in this field, particularly during the screening phase. The ascent of digital platforms for data collection has propelled advanced psychopathology classification methods such as supervised machine learning (ML) into the forefront of both research and clinical environments. This shift, recently called psycho-informatics, has been facilitated by gradually incorporating computational devices into clinical workflows. However, an actual integration between telemedicine and the ML approach has yet to be fulfilled. Objective: Under these premises, exploring the potential of ML applications for analyzing digitally collected data may have significant implications for supporting the clinical practice of diagnosing early psychopathology. The purpose of this study was, therefore, to exploit ML models for the classification of attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) using internet-based parent-reported socio-anamnestic data, aiming at obtaining accurate predictive models for new help-seeking families. Methods: In this retrospective, single-center observational study, socio-anamnestic data were collected from 1688 children and adolescents referred for suspected neurodevelopmental conditions. The data included sociodemographic, clinical, environmental, and developmental factors, collected remotely through the first Italian internet-based screening tool for neurodevelopmental disorders, the Medea Information and Clinical Assessment On-Line (MedicalBIT). Random forest (RF), decision tree, and logistic regression models were developed and evaluated using classification accuracy, sensitivity, specificity, and importance of independent variables. Results: The RF model demonstrated robust accuracy, achieving 84% (95% CI 82-85; P<.001) for ADHD and 86% (95% CI 84-87; P<.001) for ASD classifications. Sensitivities were also high, with 93% for ADHD and 95% for ASD. In contrast, the DT and LR models exhibited lower accuracy (DT 74%, 95% CI 71-77; P<.001 for ADHD; DT 79%, 95% CI 77-82; P<.001 for ASD; LR 61%, 95% CI 57-64; P<.001 for ADHD; LR 63%, 95% CI 60-67; P<.001 for ASD) and sensitivities (DT: 82% for ADHD and 88% for ASD; LR: 62% for ADHD and 68% for ASD). The independent variables considered for classification differed in importance between the 2 models, reflecting the distinct characteristics of the 3 ML approaches. Conclusions: This study highlights the potential of ML models, particularly RF, in enhancing the diagnostic process of child and adolescent psychopathology. Altogether, the current findings underscore the significance of leveraging digital platforms and computational techniques in the diagnostic process. While interpretability remains crucial, the developed approach might provide valuable screening tools for clinicians, highlighting the significance of embedding computational techniques in the diagnostic process.
What problem does this paper attempt to address?