Abstract:Whole slide images (WSIs) pose unique challenges when training deep learning models. They are very large which makes it necessary to break each image down into smaller patches for analysis, image features have to be extracted at multiple scales in order to capture both detail and context, and extreme class imbalances may exist. Significant progress has been made in the analysis of these images, thanks largely due to the availability of public annotated datasets. We postulate, however, that even if a method scores well on a challenge task, this success may not translate to good performance in a more clinically relevant workflow. Many datasets consist of image patches which may suffer from data curation bias; other datasets are only labelled at the whole slide level and the lack of annotations across an image may mask erroneous local predictions so long as the final decision is correct. In this paper, we outline the differences between patch or slide-level classification versus methods that need to localize or segment cancer accurately across the whole slide, and we experimentally verify that best practices differ in both cases. We apply a binary cancer detection network on post neoadjuvant therapy breast cancer WSIs to find the tumor bed outlining the extent of cancer, a task which requires sensitivity and precision across the whole slide. We extensively study multiple design choices and their effects on the outcome, including architectures and augmentations. Furthermore, we propose a negative data sampling strategy, which drastically reduces the false positive rate (7% on slide level) and improves each metric pertinent to our problem, with a 15% reduction in the error of tumor extent.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced in detecting cancer in Whole Slide Images (WSIs). Specifically, the paper focuses on the following aspects: 1. **Large - scale image processing**: WSIs are very large and usually need to be decomposed into smaller image patches for analysis. This method needs to consider multi - scale information when extracting image features in order to capture details and context. 2. **Class imbalance problem**: When training deep - learning models, extreme class imbalance may exist in WSIs, that is, the number of samples in some classes is much larger than that in other classes. 3. **Data bias**: Many publicly available datasets consist of image patches annotated by experts, and these datasets may be affected by data management bias. For example, training and validation datasets are usually collected by the same experts or under the same guidelines, resulting in a higher proportion of positive - class samples (such as cancer tissues). 4. **Applicability to clinical workflows**: Even if a method performs well on a specific task, this success may not be directly translated into high performance in clinical workflows. The paper points out that many datasets are only annotated at the whole - slide level and lack detailed annotations, which may lead to local prediction errors being ignored. To overcome these problems, the paper proposes a new negative - sample sampling strategy to improve model performance by reducing the false - positive rate. In addition, the paper also investigates the impact of different design choices (such as architecture and augmentation methods) on the results and conducts experimental verification on WSIs of breast cancer after neoadjuvant treatment (NAT). Specifically, the main contributions of the paper include: - **Negative - sample sampling strategy**: A negative - sample sampling method based on feature clustering is proposed, which significantly reduces the false - positive rate (from 7% to 2%) and improves the accuracy of tumor - extent estimation. - **Multi - task comparative study**: The best practices of patch - level classification, slide - level classification, and slide - level segmentation tasks are compared, and it is found that there are significant differences in the best practices of these tasks. - **Study of model complexity and augmentation methods**: The impact of different model complexity and image - augmentation methods on task performance is studied, and it is found that the EfficentNet - B3 model provides the best bias - and - variance balance in sliding - window tasks. In summary, this paper aims to improve the accuracy and robustness of cancer detection in WSIs by improving the negative - sample sampling strategy and optimizing model design.

Overcoming the limitations of patch-based learning to detect cancer in whole slide images

Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent

Robust whole slide image analysis for cervical cancer screening using deep learning

Weakly Supervised Deep Learning for Whole Slide Lung Cancer Image Analysis

Multi-label Recognition of Cancer-Related Lesions with Clinical Priors on White-Light Endoscopy

Classifying Whole Slide Images: What Matters?

Finding Regions of Interest in Whole Slide Images Using Multiple Instance Learning

Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification

Whole-slide-imaging Cancer Metastases Detection and Localization with Limited Tumorous Data

Automatic Whole Slide Pathology Image Diagnosis Framework Via Unit Stochastic Selection and Attention Fusion

Computer-aided Detection of Squamous Carcinoma of the Cervix in Whole Slide Images

Contrastive learning-based histopathological features infer molecular subtypes and clinical outcomes of breast cancer from unannotated whole slide images

Fast Whole Slide Image Analysis of Cervical Cancer Using Weak Annotation.

BM-Net: CNN-Based MobileNet-V3 and Bilinear Structure for Breast Cancer Detection in Whole Slide Images

Whole Slide Image Multi-Classification of Cervical Epithelial Lesions Based on Unsupervised Pre-training

Data-efficient and weakly supervised computational pathology on whole-slide images

Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis

End-to-end Learning for Image-based Detection of Molecular Alterations in Digital Pathology

Efficient Classification of Histopathology Images

Dual-path network with synergistic grouping loss and evidence driven risk stratification for whole slide cervical image analysis

Automatic diagnosis and grading of Prostate Cancer with weakly supervised learning on whole slide images