Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Ethar Alzaid,Gabriele Pergola,Harriet Evans,David Snead,Fayyaz Minhas

2024-05-03

Abstract:Pathology reports are rich in clinical and pathological details but are often presented in free-text format. The unstructured nature of these reports presents a significant challenge limiting the accessibility of their content. In this work, we present a practical approach based on the use of large multimodal models (LMMs) for automatically extracting information from scanned images of pathology reports with the goal of generating a standardised report specifying the value of different fields along with estimated confidence about the accuracy of the extracted fields. The proposed approach overcomes limitations of existing methods which do not assign confidence scores to extracted fields limiting their practical use. The proposed framework uses two stages of prompting a Large Multimodal Model (LMM) for information extraction and validation. The framework generalises to textual reports from multiple medical centres as well as scanned images of legacy pathology reports. We show that the estimated confidence is an effective indicator of the accuracy of the extracted information that can be used to select only accurately extracted fields. We also show the prognostic significance of structured and unstructured data from pathology reports and show that the automatically extracted field values significant prognostic value for patient stratification. The framework is available for evaluation via the URL:

Computation and Language

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on the standardization and information extraction of pathological reports. Specifically: 1. **Information extraction from unstructured pathological reports**: Although pathological reports contain rich clinical and pathological details, they are usually presented in the form of free text. The unstructured nature of these reports limits the accessibility and utilization efficiency of their content. Existing methods have limitations when extracting information from these reports. For example, they do not provide confidence score for the extracted fields, which limits the practical application value of these methods. 2. **Improving the reliability of information extraction**: The method proposed in the paper aims to automatically extract information from scanned pathological report images by using large - scale multimodal models (LMMs), generate standardized reports, and estimate the confidence level for each extracted field at the same time. This process not only improves the accuracy of information extraction, but also increases the reliability of the results through confidence scoring, enabling users to selectively accept or reject certain data according to the confidence level. 3. **Analyzing the prognostic significance of pathological reports**: In addition to information extraction and standardization, the paper also explores the prognostic value of structured and unstructured pathological report data. Research shows that the automatically extracted field values have significant prognostic value for patient stratification, which is of great significance for formulating personalized treatment plans. In conclusion, the main goal of this paper is to develop a two - stage information extraction framework based on large - scale multimodal models to overcome the limitations of existing methods, improve the accuracy and reliability of pathological report information extraction, and explore the potential value of this information in clinical decision - making.

Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Medical Report Generation Via Multimodal Spatio-Temporal Fusion

Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

Large language models for extracting histopathologic diagnoses from electronic health records

Synoptic Reporting by Summarizing Cancer Pathology Reports using Large Language Models

Exploring Multimodal Large Language Models for Radiology Report Error-checking

Enhancing Clinical Data Extraction from Pathology Reports: A Comparative Analysis of Large Language Models

Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing

Natural Language Processing to extract SNOMED-CT codes from pathological reports

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook

Abstract 890: Multimodal modeling of digitized histopathology slides improves risk stratification in hormone receptor-positive breast cancer patients

Extracting lung cancer staging descriptors from pathology reports: A generative language model approach

Resource-Efficient Medical Report Generation using Large Language Models

Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer

Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning

HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

Validation of large language models for detecting pathologic complete response in breast cancer using population-based pathology reports