HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

Zhengrui Guo,Jiabo Ma,Yingxue Xu,Yihui Wang,Liansheng Wang,Hao Chen
2024-06-18
Abstract:Histopathology serves as the gold standard in cancer diagnosis, with clinical reports being vital in interpreting and understanding this process, guiding cancer treatment and patient care. The automation of histopathology report generation with deep learning stands to significantly enhance clinical efficiency and lessen the labor-intensive, time-consuming burden on pathologists in report writing. In pursuit of this advancement, we introduce HistGen, a multiple instance learning-empowered framework for histopathology report generation together with the first benchmark dataset for evaluation. Inspired by diagnostic and report-writing workflows, HistGen features two delicately designed modules, aiming to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. To achieve this, a local-global hierarchical encoder is developed for efficient visual feature aggregation from a region-to-slide perspective. Meanwhile, a cross-modal context module is proposed to explicitly facilitate alignment and interaction between distinct modalities, effectively bridging the gap between the extensive visual sequences of WSIs and corresponding highly summarized reports. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin. Moreover, the results of fine-tuning our model on cancer subtyping and survival analysis tasks further demonstrate superior performance compared to SOTA methods, showcasing strong transfer learning capability. Dataset, model weights, and source code are available in <a class="link-external link-https" href="https://github.com/dddavid4real/HistGen" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper focuses on the automation of pathology report generation, which is an important step in cancer diagnosis. Currently, it relies on manual labor, is time-consuming, and prone to errors. The research proposes a framework called HistGen to address this issue using Multiple Instance Learning (MIL). HistGen consists of two key modules: a local-global hierarchical encoder for efficient aggregation of visual features from regions to slices, and a cross-modal context module to promote information alignment and interaction between images and texts. In addition, they create a benchmark dataset consisting of approximately 7,800 pairs of whole-slide images and diagnostic reports, and pretrain a general MIL feature extractor to enhance feature encoding. Through experiments, HistGen significantly outperforms existing methods in the pathology report generation task and also demonstrates superior performance in cancer subtype classification and survival analysis tasks, showcasing its strong transfer learning capability. Overall, this paper aims to improve clinical efficiency, alleviate the workload of pathologists, and assist in pathological analysis through automated report generation.