Abstract:Objective:The objective of this study is to develop an efficient multimodal learning framework for the classification of glaucoma. Glaucoma is a group of eye diseases that can result in vision loss and blindness, often due to delayed detection and treatment. Fundus images and optical coherence tomography (OCT) images have proven valuable for the diagnosis and management of glaucoma. However, current models that combine features from both modalities often lack efficient spatial relationship modeling. Approach:In this study, we propose an innovative approach to address the classification of glaucoma. We focus on leveraging the features of OCT volumes and harness the capabilities of transformer models to capture long-range spatial relationships. To achieve this, we introduce a 3D transformer model to extract features from OCT volumes, enhancing the model's effectiveness. Additionally, we employ downsampling techniques to enhance model efficiency. We then utilize the spatial feature relationships between OCT volumes and fundus images to fuse the features extracted from both sources. Main Results:Our proposed framework has yielded remarkable results, particularly in terms of glaucoma grading performance. We conducted our experiments using the GAMMA dataset, and our approach outperformed traditional feature fusion methods. By effectively modeling spatial relationships and combining OCT volume and fundus map features, our framework achieved outstanding classification results.Significance:This research is of significant importance in the field of glaucoma diagnosis and management. Efficient and accurate glaucoma classification is essential for timely intervention and prevention of vision loss. Our proposed approach, which integrates 3D transformer models, offers a novel way to extract and fuse features from OCT volumes and fundus images, ultimately enhancing the effectiveness of glaucoma classification. This work has the potential to contribute to improved patient care, particularly in the early detection and treatment of glaucoma, thereby reducing the risk of vision impairment and blindness.

Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition

Adapting the Segment Anything Model for Multi-modal Retinal Anomaly Detection and Localization

Mstnet: method for glaucoma grading based on multimodal feature fusion of spatial relations

Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging

Multimodal Information Fusion for Glaucoma and DR Classification

Cross-Fundus Transformer for Multi-modal Diabetic Retinopathy Grading with Cataract

Deep Relation Transformer for Diagnosing Glaucoma With Optical Coherence Tomography and Visual Field Function

Multi-modality Network Based on CGAN and Attention Mechanism for Glaucoma Grading.

Transformer-based Cross-Modal Multi-Contrast Network for Ophthalmic Diseases Diagnosis

Unifying Structure Analysis and Surrogate-driven Function Regression for Glaucoma OCT Image Screening

Multi-step framework for glaucoma diagnosis in retinal fundus images using deep learning

Geometric Correspondence-Based Multimodal Learning for Ophthalmic Image Analysis

Towards multi-center glaucoma OCT image screening with semi-supervised joint structure and function multi-task learning

Multi-Modal Multi-Instance Learning for Retinal Disease Recognition

Asynchronous feature regularization and cross-modal distillation for OCT based glaucoma diagnosis

ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

A Classification Model for Glaucoma Grading Using Multi-Modal Image Fusion Strategies

COROLLA: An Efficient Multi-Modality Fusion Framework with Supervised Contrastive Learning for Glaucoma Grading

Improving the generalization of glaucoma detection on fundus images via feature alignment between augmented views

Multiple Modality Fusion for Glaucoma Diagnosis

Multi-resolution visual Mamba with multi-directional selective mechanism for retinal disease detection