MaxCerVixT: A Novel Lightweight Vision Transformer-Based Approach for Precise Cervical Cancer Detection

Ishak Pacal
DOI: https://doi.org/10.1016/j.knosys.2024.111482
IF: 8.139
2024-02-18
Knowledge-Based Systems
Abstract:Early detection is essential for cervical cancer therapy, which is the fourth most frequent malignancy worldwide. While the Pap smear test is the established approach for identifying cervical cancer, its reliability relies on the proficiency of healthcare professionals. Computer-aided diagnosis (CADx) systems utilize deep learning and medical image analysis to improve the accuracy and speed of diagnoses. Nonetheless, the utilization of these systems faces obstacles such as insufficient data, variations in images, and issues related to image quality. This article presents an advanced architectural framework, the Multi-Axis Vision Transformer (MaxViT), designed to address challenges. Adapting MaxViT for Pap smear data yields a lightweight structure, offering superior accuracy and inference speed. To improve our proposed model's performance, we substituted MBConv blocks in the MaxViT architecture with ConvNeXtv2 blocks and MLP blocks with GRN-based MLPs. This modification not only reduced parameter counts but also enhanced the model's generalization capabilities. The proposed method underwent evaluation using the publicly available SIPaKMeD and Mendeley LBC, pap smear datasets, employing a total of 106 deep learning models, 53 CNNs and 53 vision transformer models for each dataset. In comparison with experimental and state-of-the-art methods, the proposed method demonstrated notable accuracy, surpassing existing literature and all deep learning models, achieving 99.02% accuracy on the SIPaKMeD dataset and 99.48% on the LBC dataset. This study stands out as the most extensive and comprehensive effort, employing 106 deep learning models for diagnosing cervical cancer through pap smear images.
computer science, artificial intelligence
What problem does this paper attempt to address?