Abstract:Cervical cancer seriously endangers the health of the female reproductive system and even risks women's life in severe cases. Optical coherence tomography (OCT) is a non-invasive, real-time, high-resolution imaging technology for cervical tissues. However, since the interpretation of cervical OCT images is a knowledge-intensive, time-consuming task, it is tough to acquire a large number of high-quality labeled images quickly, which is a big challenge for supervised learning. In this study, we introduce the vision Transformer (ViT) architecture, which has recently achieved impressive results in natural image analysis, into the classification task of cervical OCT images. Our work aims to develop a computer-aided diagnosis (CADx) approach based on a self-supervised ViT-based model to classify cervical OCT images effectively. We leverage masked autoencoders (MAE) to perform self-supervised pre-training on cervical OCT images, so the proposed classification model has a better transfer learning ability. In the fine-tuning process, the ViT-based classification model extracts multi-scale features from OCT images of different resolutions and fuses them with the cross-attention module. The ten-fold cross-validation results on an OCT image dataset from a multi-center clinical study of 733 patients in China indicate that our model achieved an AUC value of 0.9963 ± 0.0069 with a 95.89 ± 3.30% sensitivity and 98.23 ± 1.36 % specificity, outperforming some state-of-the-art classification models based on Transformers and convolutional neural networks (CNNs) in the binary classification task of detecting high-risk cervical diseases, including high-grade squamous intraepithelial lesion (HSIL) and cervical cancer. Furthermore, our model with the cross-shaped voting strategy achieved a sensitivity of 92.06% and specificity of 95.56% on an external validation dataset containing 288 three-dimensional (3D) OCT volumes from 118 Chinese patients in a different new hospital. This result met or exceeded the average of four medical experts who have used OCT for over one year. In addition to promising classification performance, our model has a remarkable ability to detect and visualize local lesions using the attention map of the standard ViT model, providing good interpretability for gynecologists to locate and diagnose possible cervical diseases.

MaxCerVixT: A Novel Lightweight Vision Transformer-Based Approach for Precise Cervical Cancer Detection

Deep learning-based approaches for robust classification of cervical cancer

CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework Using CNN, Visual Transformer and Multilayer Perceptron

ViT-PSO-SVM: Cervical Cancer Predication Based on Integrating Vision Transformer with Particle Swarm Optimization and Support Vector Machine

Enhancing cervical cancer diagnosis: Integrated attention-transformer system with weakly supervised learning

Lightweight Low-Rank Adaptation Vision Transformer Framework for Cervical Cancer Detection and Cervix Type Classification

CerviFormer: A Pap-smear based cervical cancer classification method using cross attention and latent transformer

A Deep Learning-Based Approach for Cervical Cancer Classification Using 3D CNN and Vision Transformer

MFEM-CIN: A Lightweight Architecture Combining CNN and Transformer for the Classification of Pre-Cancerous Lesions of the Cervix

Cross-Attention Based Multi-Resolution Feature Fusion Model for Self-Supervised Cervical OCT Image Classification

Enhancing cervical cancer detection and robust classification through a fusion of deep learning models

CerviFormer : A pap smear‐based cervical cancer classification method using cross‐attention and latent transformer

Optimal Deep Convolution Neural Network for Cervical Cancer Diagnosis Model

YOLO-based CAD framework with ViT transformer for breast mass detection and classification in CESM and FFDM images

CervixFormer: A Multi-scale swin transformer-Based cervical pap-Smear WSI classification framework

CerviXpert: A Multi-Structural Convolutional Neural Network for Predicting Cervix Type and Cervical Cell Abnormalities

CAM-VT: A Weakly supervised cervical cancer nest image identification approach using conjugated attention mechanism and visual transformer

Deep Learning Approaches for Analysing Papsmear Images to Detect Cervical Cancer

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

VTCNet: A Feature Fusion DL Model Based on CNN and ViT for the Classification of Cervical Cells

Multiple serous cavity effusion screening based on smear images using vision transformer