Development and Validation of a Deep Learning System for the Diagnosis of Pediatric Diseases: A Large-Scale Real-World Data Study in Shanghai

Xiaoling Ge,Yi Wang,Li Xie,Yujuan Shang,Yihui Zhai,Zhiheng Huang,Jianfeng Huang,Chengjie Ye,Ao Ma,Wanting Li,Xiaobo Zhang,Hong Xu
DOI: https://doi.org/10.1101/2022.10.07.22280541
2022-10-12
MedRxiv
Abstract:Background Artificial intelligence (AI)-assisted diagnosis is considered to be the future direction of improving the efficiency and accuracy of pediatric diseases diagnosis, while the existing research based on AI are far from sufficient because of limited data amount, inadequate coverage of disease types, or high construction costs, and have not been applied on a large scale. We aimed to develop an accurate deep learning model trained on millions of real-world data to verify the feasibility of the technology, and build the whole process of outpatient auxiliary diagnosis. Methods and findings We applied a Chinese Natural Language Processing (NLP) and an end-to-end deep neural network classifier to the outpatients electronic medical records (EMRs) in a single child care center in Shanghai, China, to unstructured text processing and construct an auxiliary diagnostic model, all patients were aged from 0 to 18 years. A training cohort with millions of records and an independent validation cohort with tens of thousands of records were intake separately and calculate diagnosis concordance rate (DCR) of model in each diseases group. The records with inconsistent diagnoses between human and AI were evaluated by clinical experts group, and calculate the relative correct rate (RCR) to evaluate the diagnostic performance of the model. A total of 5,271,347 medical records were intake in model training covering sixteen categories of diseases according to disease coding, reaching a DCR of 95.49% (95.48~95.51). For validation, 91,880 records were obtained from validation dataset, which reached a DCR of 93.51% (93.35~93.67) and FDCR of 72.04% (71.75~72.33). It was confirmed that the accuracy of the model was still higher than that of human with most RCR>1 in validation dataset. Conclusions The deep learning system could support diagnosis of pediatric diseases, which has high diagnostic performance, comprehensive disease coverage, feasible technology, and can be promoted in multiple sites in the future.
What problem does this paper attempt to address?