Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis

Jiangpeng Yan,Hanbo Chen,Xiu Li,Jianhua Yao
DOI: https://doi.org/10.1016/j.compmedimag.2022.102053
Abstract:Background: Deep convolutional neural networks (CNNs) have yielded promising results in automatic whole slide images (WSIs) processing for digital pathology in recent years. Training supervised CNNs usually requires a large amount of annotated samples. However, manual annotation of gigapixel WSIs is labor-intensive and error-prone, i.e., the shortage of annotations has become the major bottleneck of WSI diagnosis model development. In this work, we aim to develop a deep learning based self-supervised histopathology image analysis workflow that can classify tissues without any annotation. Methods: Inspired by the contrastive learning methods that have achieved state-of-the-art results on unsupervised representation learning for natural images, we adopt the self-supervised training scheme to generate discriminative embeddings from annotation-free WSI patches and simultaneously obtain initial clusters, which are further refined by a silhouette coefficient based recursive scheme to divide tissue mixture clusters. A multi-scale encoder network is specifically designed to extract pathology-specific contextual features. A tissue dictionary composed by the tissue clusters is then built for cancer diagnosis. Results: Experiments show that our method can identify different tissues in annotation-free conditions with competitive results (achieving the accuracy of 0.9364/0.9325 in human colorectal/sentinel lymph WSIs) as the supervised methods (with the corresponding accuracy of 0.9806/0.9494) and surpass other unsupervised baselines. Our method is also evaluated in a cohort of 20 clinical patients and get an AUC score of 0.99 to distinguish benign/malignant polyps. Conclusion: Our proposed deep contrastive learning based tissue clustering method can learn from raw WSIs without annotation to distinguish different tissues. The method are tested in three different datasets and show the potential to help pathologists diagnosing diseases as a quantitative and qualitative tool.
What problem does this paper attempt to address?