CellViT: Vision Transformers for Precise Cell Segmentation and Classification

Fabian Hörst,Moritz Rempe,Lukas Heine,Constantin Seibold,Julius Keyl,Giulia Baldini,Selma Ugurel,Jens Siveke,Barbara Grünwald,Jan Egger,Jens Kleesiek

2023-10-06

Abstract:Nuclei detection and segmentation in hematoxylin and eosin-stained (H&E) tissue images are important clinical tasks and crucial for a wide range of applications. However, it is a challenging task due to nuclei variances in staining and size, overlapping boundaries, and nuclei clustering. While convolutional neural networks have been extensively used for this task, we explore the potential of Transformer-based networks in this domain. Therefore, we introduce a new method for automated instance segmentation of cell nuclei in digitized tissue samples using a deep learning architecture based on Vision Transformer called CellViT. CellViT is trained and evaluated on the PanNuke dataset, which is one of the most challenging nuclei instance segmentation datasets, consisting of nearly 200,000 annotated Nuclei into 5 clinically important classes in 19 tissue types. We demonstrate the superiority of large-scale in-domain and out-of-domain pre-trained Vision Transformers by leveraging the recently published Segment Anything Model and a ViT-encoder pre-trained on 104 million histological image patches - achieving state-of-the-art nuclei detection and instance segmentation performance on the PanNuke dataset with a mean panoptic quality of 0.50 and an F1-detection score of 0.83. The code is publicly available at <a class="link-external link-https" href="https://github.com/TIO-IKIM/CellViT" rel="external noopener nofollow">this https URL</a>

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper attempts to address the problem of nuclear detection and segmentation in Hematoxylin and Eosin (H&E) stained tissue images. This is an important clinical task and is crucial for various applications. However, this task is challenging due to the variations in nuclear staining, size, boundary overlap, and aggregation. Although Convolutional Neural Networks (CNNs) have been widely used for this task, the authors explore the potential of Transformer-based networks in this field. Specifically, the paper proposes a new method called CellViT for automatic instance segmentation and classification of nuclei in digitized tissue samples. CellViT is based on the Vision Transformer architecture and achieves the best performance on the PanNuke dataset, which contains nearly 200,000 annotated nuclei involving 19 different tissue types and 5 clinically significant nuclear categories, through large-scale pre-training and fine-tuning on specific datasets. The main contributions include: 1. Proposing a novel U-Net shaped encoder-decoder network that utilizes the Vision Transformer as the encoder network, significantly surpassing existing nuclear detection methods and achieving segmentation results comparable to other state-of-the-art methods on the PanNuke dataset. 2. Applying the Vision Transformer for the first time to nuclear instance segmentation on the PanNuke dataset, demonstrating its effectiveness in this field. The method combines a pre-trained ViT encoder with a decoder network connected through skip connections. 3. Providing a framework capable of fast inference on Gigapixel WSI, using large inference blocks of 1024×1024 pixels, which is 1.85 times faster than traditional 256-pixel blocks. Through these innovations, CellViT not only improves the accuracy of nuclear detection and segmentation but also provides a reliable feature extraction tool for downstream tasks.

CellViT: Vision Transformers for Precise Cell Segmentation and Classification

CellViT: Vision Transformers for precise cell segmentation and classification

TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

A Robust Deep Learning Approach for Joint Nuclei Detection and Cell Classification in Pan-Cancer Histology Images

Enhancing Cell Detection in Histopathology Images: A ViT-Based U-Net Approach

Channel Boosted CNN-Transformer-based Multi-Level and Multi-Scale Nuclei Segmentation

EfficientUNetViT: Efficient Breast Tumor Segmentation Utilizing UNet Architecture and Pretrained Vision Transformer

CViTS-Net: A CNN-ViT Network With Skip Connections for Histopathology Image Classification

Vision transformer introduces a new vitality to the classification of renal pathology

NuLite -- Lightweight and Fast Model for Nuclei Instance Segmentation and Classification

Pathological Insights: Enhanced Vision Transformers for the Early Detection of Colorectal Cancer

UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

BUViTNet: Breast Ultrasound Detection via Vision Transformers

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Implementing vision transformer for classifying 2D biomedical images

MATNet: a multi-attention transformer network for nuclei segmentation in thymoma histopathology images

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Vision Transformers for Computational Histopathology