Abstract:BACKGROUND Artificial intelligence–based assistive diagnostic systems imitate the deductive reasoning process of a human physician in biomedical disease diagnosis and treatment decision making. While impressive progress in this area has been reported, most of the reported successes are applications of artificial intelligence in Western medicine. The application of artificial intelligence in traditional Chinese medicine has lagged mainly because traditional Chinese medicine practitioners need to perform syndrome differentiation as well as biomedical disease diagnosis before a treatment decision can be made. Syndrome, a concept unique to traditional Chinese medicine, is an abstraction of a variety of signs and symptoms. The fact that the relationship between diseases and syndromes is not one-to-one but rather many-to-many makes it very challenging for a machine to perform syndrome predictions. So far, only a handful of artificial intelligence–based assistive traditional Chinese medicine diagnostic models have been reported, and they are limited in application to a single disease-type. OBJECTIVE The objective was to develop an artificial intelligence–based assistive diagnostic system capable of diagnosing multiple types of diseases that are common in traditional Chinese medicine, given a patient’s electronic health record notes. The system was designed to simultaneously diagnose the disease and produce a list of corresponding syndromes. METHODS Unstructured freestyle electronic health record notes were processed by natural language processing techniques to extract clinical information such as signs and symptoms which were represented by named entities. Natural language processing used a recurrent neural network model called bidirectional long short-term memory network–conditional random forest. A convolutional neural network was then used to predict the disease-type out of 187 diseases in traditional Chinese medicine. A novel traditional Chinese medicine syndrome prediction method—an integrated learning model—was used to produce a corresponding list of probable syndromes. By following a majority-rule voting method, the integrated learning model for syndrome prediction can take advantage of four existing prediction methods (back propagation, random forest, extreme gradient boosting, and support vector classifier) while avoiding their respective weaknesses which resulted in a consistently high prediction accuracy. RESULTS A data set consisting of 22,984 electronic health records from Guanganmen Hospital of the China Academy of Chinese Medical Sciences that were collected between January 1, 2017 and September 7, 2018 was used. The data set contained a total of 187 diseases that are commonly diagnosed in traditional Chinese medicine. The diagnostic system was designed to be able to detect any one of the 187 disease-types. The data set was partitioned into a training set, a validation set, and a testing set in a ratio of 8:1:1. Test results suggested that the proposed system had a good diagnostic accuracy and a strong capability for generalization. The disease-type prediction accuracies of the top one, top three, and top five were 80.5%, 91.6%, and 94.2%, respectively. CONCLUSIONS The main contributions of the artificial intelligence–based traditional Chinese medicine assistive diagnostic system proposed in this paper are that 187 commonly known traditional Chinese medicine diseases can be diagnosed and a novel prediction method called an integrated learning model is demonstrated. This new prediction method outperformed all four existing methods in our preliminary experimental results. With further improvement of the algorithms and the availability of additional electronic health record data, it is expected that a wider range of traditional Chinese medicine disease-types could be diagnosed and that better diagnostic accuracies could be achieved.

Multi-Task Learning for Symptom Name Recognition and Severity Assessment in Electronic Medical Records (Preprint)

Medical Big Data Mining: Joint Symptom Name Recognition and Severity Estimation

An Improved Multitask Learning Model with Matching Network and Its Application in Traditional Chinese Medicine Syndrome Recommendation

Multi-task learning for Chinese clinical named entity recognition with external knowledge

Combining the External Medical Knowledge Graph Embedding to Improve the Performance of Syndrome Differentiation Model

Efficient symptom inquiring and diagnosis via adaptive alignment of reinforcement learning and classification

MD-MTL: An Ensemble Med-Multi-Task Learning Package for DiseaseScores Prediction and Multi-Level Risk Factor Analysis

Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage

Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification

Extracting Symptoms and their Status from Clinical Conversations

Traditional Chinese Medicine Symptom Normalization Approach Leveraging Hierarchical Semantic Information and Text Matching with Attention Mechanism

TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing

Artificial Intelligence–Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study (Preprint)

TLDA: A transfer learning based dual-augmentation strategy for traditional Chinese Medicine syndrome differentiation in rare disease

Predicting Adverse Neonatal Outcomes for Preterm Neonates with Multi-Task Learning

Auxiliary Diagnosis Based on the Knowledge Graph of TCM Syndrome

Automated mood disorder symptoms monitoring from multivariate time-series sensory data: getting the full picture beyond a single number

Automated Multi-Task Learning for Joint Disease Prediction on Electronic Health Records

A Study on the Named Entity Recognition Method on Symptom Names in the History of Present Illness in Traditional Chinese Medical (TCM) Clinic

Natural Language Processing Algorithms for Normalizing Expressions of Synonymous Symptoms in Traditional Chinese Medicine

Multi-label classification of symptom terms from free-text bilingual adverse drug reaction reports using natural language processing.