Predicting the Visual Attention of Pathologists Evaluating Whole Slide Images of Cancer

Souradeep Chakraborty,Rajarsi Gupta,Ke Ma,Darshana Govind,Pinaki Sarder,Won-Tak Choi,Waqas Mahmud,Eric Yee,Felicia Allard,Beatrice Knudsen,Gregory Zelinsky,Joel Saltz,Dimitris Samaras
DOI: https://doi.org/10.1007/978-3-031-16961-8_2
2022-01-01
Abstract:This work presents PathAttFormer, a deep learning model that predicts the visual attention of pathologists viewing whole slide images (WSIs) while evaluating cancer. This model has two main components: (1) a patch-wise attention prediction module using a Swin transformer backbone and (2) a self-attention based attention refinement module to compute pairwise-similarity between patches to predict spatially consistent attention heatmaps. We observed a high level of agreement between model predictions and actual viewing behavior, collected by capturing panning and zooming movements using a digital microscope interface. Visual attention was analyzed in the evaluation of prostate cancer and gastrointestinal neuroendocrine tumors (GI-NETs), which differ greatly in terms of diagnostic paradigms and the demands on attention. Prostate cancer involves examining WSIs stained with Hematoxylin and Eosin (H&E) to identify distinct growth patterns for Gleason grading. In contrast, GI-NETs require a multi-step approach of identifying tumor regions in H&E WSIs and grading by quantifying the number of Ki-67 positive tumor cells highlighted with immunohistochemistry (IHC) in a separate image. We collected attention data from pathologists viewing prostate cancer H&EWSIs from The Cancer Genome Atlas (TCGA) and 21 H&E WSIs of GI-NETs with corresponding Ki-67 IHC WSIs. This is the first work that utilizes the Swin transformer architecture to predict visual attention in histopathology images of GI-NETs, which is generalizable to predicting attention in the evaluation of multiple sequential images in real world diagnostic pathology and IHC applications.
What problem does this paper attempt to address?