Multi-task learning for tissue segmentation and tumor detection in colorectal cancer histology slides

Lydia A. Schoenpflug,Maxime W. Lafarge,Anja L. Frei,Viktor H. Koelzer
2023-04-06
Abstract:Automating tissue segmentation and tumor detection in histopathology images of colorectal cancer (CRC) is an enabler for faster diagnostic pathology workflows. At the same time it is a challenging task due to low availability of public annotated datasets and high variability of image appearance. The semi-supervised learning for CRC detection (SemiCOL) challenge 2023 provides partially annotated data to encourage the development of automated solutions for tissue segmentation and tumor detection. We propose a U-Net based multi-task model combined with channel-wise and image-statistics-based color augmentations, as well as test-time augmentation, as a candidate solution to the SemiCOL challenge. Our approach achieved a multi-task Dice score of .8655 (Arm 1) and .8515 (Arm 2) for tissue segmentation and AUROC of .9725 (Arm 1) and 0.9750 (Arm 2) for tumor detection on the challenge validation set. The source code for our approach is made publicly available at <a class="link-external link-https" href="https://github.com/lely475/CTPLab_SemiCOL2023" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the automation of tissue segmentation and tumor detection in colorectal cancer (CRC) tissue sections. Specifically: 1. **Research Background**: - Colorectal cancer is one of the leading causes of cancer-related deaths. - One goal of digital pathology is to accelerate diagnostic workflows by automating routine tasks such as analyzing epithelial tumor tissue in biopsy and colectomy samples. 2. **Challenges**: - The scarcity of publicly annotated datasets and the high variability in image appearance make automated tissue segmentation and tumor detection challenging. - The Semi-Supervised Learning for CRC Detection (SemiCOL) challenge provides partially annotated datasets, encouraging the development of automated solutions. 3. **Proposed Solution**: - A U-Net-based multi-task model is proposed, incorporating channel-level color augmentation, image statistics-based color augmentation, and test-time augmentation (TTA) techniques. - The model performs well in both segmentation and tumor detection tasks, achieving multi-task Dice scores of 0.8655 (Arm 1) and 0.8515 (Arm 2) on the validation set, and AUROC scores for tumor detection of 0.9725 (Arm 1) and 0.9750 (Arm 2). 4. **Main Methods**: - The model architecture is based on U-Net, including an encoder for feature extraction, a decoder head for tissue segmentation, and a fully connected classification head for tumor detection. - Weakly annotated data is used for semi-supervised learning during training, and data augmentation is employed to enhance the model's generalization ability. - For inference, only the segmentation branch is used, and the tumor detection score is calculated based on the predicted segmentation results. 5. **Experimental Results**: - A series of experiments demonstrate that adding the tumor detection branch, channel-level color augmentation, image statistics-based color augmentation, and test-time augmentation significantly improve model performance. - Particularly on external validation sets, these improvements lead to significant enhancements in multi-class Dice scores and AUROC. Through these methods, the study provides an effective automated solution for tissue segmentation and tumor detection in colorectal cancer pathology images.