Abstract:Background: The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals.Methods: We propose two encoding methods: one that directly determines the desired code (flat method), and one that hierarchically determines the most suitable code until the desired code is obtained (hierarchical method). Both methods are based on instances from the standard diagnostic library, a gold standard dataset in China. For the first time, semantic similarity estimation between Chinese words are applied in the biomedical domain with the successful implementation of knowledge-based and distributional approaches. Characteristics of the Chinese language are considered in implementing distributional semantics. We test our methods against 16,330 coding instances from our partner hospital.Results: The hierarchical method outperforms the flat method in terms of accuracy and time complexity. Representing distributional semantics using Chinese characters can achieve comparable performance to the use of Chinese words. The diagnoses in the test set can be encoded automatically with micro-averaged precision of 92.57 %, recall of 89.63 %, and F-score of 91.08 %. A sharp decrease in encoding performance is observed without semantic similarity estimation.Conclusion: The hierarchical nature of ICD-10 codes can enhance the performance of the automated code assignment. Semantic similarity estimation is demonstrated indispensable in dealing with Chinese medical text. The proposed method can greatly reduce the workload and improve the efficiency of the code assignment process in Chinese hospitals.

Density-Based Clustering Algorithm for Hybrid Coding Detection in Search Engines

Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers.

A Hybrid Recommendation Algorithm Based on Clustering and Collaborative Filtering

A Hybrid Method for Icd-10 Auto-Coding of Chinese Diagnoses

An online clustering algorithm for Chinese web snippets based on Generalized Suffix Array

Image Cluster Algorithm of Hybrid Encoding Method

Adaptive encoding-based evolutionary approach for Chinese document clustering

PCCS：A FAST CLUSTERING AND CLASSIFICATION METHOD FOR WEB DOCUMENT

On Combining Link and Contents Information for Web Page Clustering

DBSCAN and K-Means Hybrid Clustering Based Automatic Dental Feature Detection

K-Means Clustering Analysis Based On Adaptive Weights For Malicious Code Detection

Specific Website Subject Recognition Based on the Hybrid Vector Space Model

Parallelized Near-Duplicate Document Detection Algorithm for Large Scale Chinese Web Pages

Content Caching Clustering Based on Piecewise Interest Similarity

A Hierarchical Method to Automatically Encode Chinese Diagnoses Through Semantic Similarity Estimation

Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information

Learning to Cluster Web Search Results.

Hierarchically Classifying Chinese Web Documents Without Dictionary Support And Segmentation Procedure

A Coding Hierarchy Computing Based Clustering Algorithm

New Automatic Categorization Algorithm for Chinese Homepages

C4-2: Combining Link and Contents in Clustering Web Search Results to Improve Information Interpretation