Contextual classification of clinical records with bidirectional long short‐term memory (Bi‐LSTM) and bidirectional encoder representations from transformers (BERT) model

Jaya Zalte,Harshal Shah
DOI: https://doi.org/10.1111/coin.12692
2024-08-24
Computational Intelligence
Abstract:Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional‐long short‐term memory (Bi‐LSTM), bi‐LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi‐LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi‐LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.
computer science, artificial intelligence
What problem does this paper attempt to address?