Fine-tuning the BERTSUMEXT model for Clinical Report Summarization

Pooja Vinod,Seema Safar,Divins Mathew,Parvathy Venugopal,Linta Merin Joly,Joish George
DOI: https://doi.org/10.1109/incet49848.2020.9154087
2020-06-01
Abstract:Background: Medical personnel are expected to parse through scores of reports each day, covering the medical history of their patients. This reading task is crucial to the effectiveness of the healthcare provided. However, it has been noticed that doctors often have to spend a lot of time going through these documents, in order to get a concise gist of the most medically relevant details. This could even affect the amount of time left for doctor-patient interaction. It is in this scenario, that the potential usefulness of an automatic clinical report summarization tool becomes apparent. Such a system would save a lot of effort for the doctor, and make a lot of time available for quality patient-doctor interaction. The focus of this paper is on extractive summarization. Method: Due to its vast pre-training, BERT (Bidirectional Encoder Representations from Transformers) is one of the most knowledgeable NLP (Natural Language Processing) models currently available- making it one of the best choices for a task like summarization. BERTSUM is the BERT version fine-tuned for summarization, BERTSUMEXT being the extractive summarization variant. The BERTSUMEXT architecture has previously been used to create a model that has been extensively pre-trained on the CNN/DailyMail dataset of news articles and corresponding summaries. It was noticed through testing that this pre-trained version of BERTSUMEXT does not perform very well on clinical reports and therefore needs to be improved to be employed in a clinical report summarization system. The method adopted here is to further train the BERTSUMEXT model using different training strategies on a clinical report summarization dataset and assess the performance improvement. The idea is to expand BERTSUMEXT’s knowledge to give it a ‘medical edge’ that it lacks. Results: The training strategy that modifies the parameter values of the extractive summarization layers of the BERTSUMEXT architecture shows a clear improvement on all nine parameters of the ROUGE (Recall Oriented Understudy for Gisting Evalution) automatic evaluation metric and the human evaluation paradigm. The ROUGE metric evaluates summary quality by measuring the overlap between the reference gold summary and the candidate summary generated by the model. The Human Evaluation Paradigm is a method where we obtain a professional doctor’s opinion on the summary quality produced by the model.
What problem does this paper attempt to address?