Abstract:Background: Diagnosing mediastinal tumours, including incidental lesions, using low-dose CT (LDCT) performed for lung cancer screening, is challenging. It often requires additional invasive and costly tests for proper characterisation and surgical planning. This indicates the need for a more efficient and patient-centred approach, suggesting a gap in the existing diagnostic methods and the potential for artificial intelligence technologies to address this gap. This study aimed to create a multimodal hybrid transformer model using the Vision Transformer that leverages LDCT features and clinical data to improve surgical decision-making for patients with incidentally detected mediastinal tumours. Methods: This retrospective study analysed patients with mediastinal tumours between 2010 and 2021. Patients eligible for surgery (n=30) were considered 'positive,' whereas those without tumour enlargement (n=32) were considered 'negative.' We developed a hybrid model combining a convolutional neural network with a transformer to integrate imaging and clinical data. The dataset was split in a 5:3:2 ratio for training, validation and testing. The model's efficacy was evaluated using a receiver operating characteristic (ROC) analysis across 25 iterations of random assignments and compared against conventional radiomics models and models excluding clinical data. Results: The multimodal hybrid model demonstrated a mean area under the curve (AUC) of 0.90, significantly outperforming the non-clinical data model (AUC=0.86, p=0.04) and radiomics models (random forest AUC=0.81, p=0.008; logistic regression AUC=0.77, p=0.004). Conclusion: Integrating clinical and LDCT data using a hybrid transformer model can improve surgical decision-making for mediastinal tumours, showing superiority over models lacking clinical data integration.

A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Mmformer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation

A Transformer-Based Model for Preoperative Early Recurrence Prediction of Hepatocellular Carcinoma with Muti-modality MRI

A transformer-based unified multimodal framework for Alzheimer's disease assessment

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation

Medical transformer for multimodal survival prediction in intensive care: integration of imaging and non-imaging data

Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis

A LLM-Based Hybrid-Transformer Diagnosis System in Healthcare

A multimodal transformer to fuse images and metadata for skin disease classification

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression

Multimodal modeling with low-dose CT and clinical information for diagnostic artificial intelligence on mediastinal tumors: a preliminary study

METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Multi-modal Deep Learning

Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review

Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach

MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report

Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment