A Semi-Automatic Data Cleaning & Coding Tool for Chinese Clinical Data Standardization.

Yani Chen,Qi Tian,Hailing Cai,Xudong Lu
DOI: https://doi.org/10.3233/SHTI220041
2021-01-01
Abstract:The clinical data often have limited usefulness because of the diversified expression. Chinese clinical data standardization can improve the usability of clinical data. The complexity of data cleaning and coding for Chinese clinical data prompted the turn of low-effective manual coding into the computer-aided tool. This study established the universal data cleaning and coding process and tool for Chinese clinical data standardization, which can greatly improve human efficiency. The process included the preprocessing, text similarity algorithm, and manual review. The standardization process proved effective for the diagnosis, drug, and examination data standardization task and can be used gradually in other clinical domains. The semi-automatic data cleaning and coding can reduce the half time for standardization, and it was used in hospitals in Beijing.
What problem does this paper attempt to address?