Advancements in eHealth Data Analytics through Natural Language Processing and Deep Learning

Elena-Simona Apostol,Ciprian-Octavian Truică
2024-01-20
Abstract:The healthcare environment is commonly referred to as "information-rich" but also "knowledge poor". Healthcare systems collect huge amounts of data from various sources: lab reports, medical letters, logs of medical tools or programs, medical prescriptions, etc. These massive sets of data can provide great knowledge and information that can improve the medical services, and overall the healthcare domain, such as disease prediction by analyzing the patient's symptoms or disease prevention, by facilitating the discovery of behavioral factors for diseases. Unfortunately, only a relatively small volume of the textual eHealth data is processed and interpreted, an important factor being the difficulty in efficiently performing Big Data operations. In the medical field, detecting domain-specific multi-word terms is a crucial task as they can define an entire concept with a few words. A term can be defined as a linguistic structure or a concept, and it is composed of one or more words with a specific meaning to a domain. All the terms of a domain create its terminology. This chapter offers a critical study of the current, most performant solutions for analyzing unstructured (image and textual) eHealth data. This study also provides a comparison of the current Natural Language Processing and Deep Learning techniques in the eHealth context. Finally, we examine and discuss some of the current issues, and we define a set of research directions in this area.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the effective processing and interpretation of large amounts of unstructured electronic health data in medical environments, such as laboratory reports, medical letters, medical device logs, medical prescriptions, etc. Although medical systems collect vast amounts of data, most of this data is not effectively utilized, especially electronic health data in text form. This is mainly due to the difficulties in efficiently performing big data operations, particularly in the medical field, where detecting multi-word terms specific to certain domains is a critical task. These terms can define entire concepts with a small number of words. Therefore, the paper aims to explore the application of the most effective current Natural Language Processing (NLP) and Deep Learning (DL) techniques in analyzing unstructured electronic health data, compare the performance of these techniques, discuss existing issues, and propose future research directions.