Comparison among Four Prominent Text Processing Tools

Jin Luo,Ruoyu Wang,Daniel Sun,Yingying Wang,Guoqiang Li
DOI: https://doi.org/10.1109/i-span.2018.00072
2018-01-01
Abstract:In the medical domain, the most common information is non-standard and unstructured, such as medical records and medical books. Unstructured information accounts for the majority of human communications, but it is difficult for computers to process and understand. In this survey, we illustrate the feature of unstructured data and explain the dependence on it within the study of Intelligent Healthcare. Then we introduce the UIMA framework, as well as three other natural language processing tools: KH Coder, WordStat and Deepdive and present a detailed comparison. The conclusion is that the future of exploiting unstructured data lies in establishing industrial standards and reducing unnecessary costs. Meanwhile, it’s also necessary to use machine learning techniques to reduce all kinds of uncertainty in unstructured data management.
What problem does this paper attempt to address?