Leveraging Pretrained Vision Transformers for Automated Cancer Diagnosis in Optical Coherence Tomography Images

Soumyajit Ray,Cheng-Yu Lee,Hyeon-Cheol Park,David Nauen,Chetan Bettegowda,Xingde Li,Rama Chellappa
DOI: https://doi.org/10.1101/2024.09.26.24314445
2024-09-27
Abstract:This study presents a novel approach to brain cancer detection based on Optical Coherence Tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained Vision Transformer (ViT) model, specifically DiNOV2, with a convolutional neural network (CNN) operating on Grey Level Co-occurrence Matrix (GLCM) texture features. This dual-path architecture leverages both the global context capture capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices used for training and validation, and 1,610 slices for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with 99.7% accuracy on the training dataset, 99.4% on the validation dataset, and 94.9% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.
What problem does this paper attempt to address?