Multimodal modeling with low-dose CT and clinical information for diagnostic artificial intelligence on mediastinal tumors: a preliminary study

Daisuke Yamada,Fumitsugu Kojima,Yujiro Otsuka,Kouhei Kawakami,Naoki Koishi,Ken Oba,Toru Bando,Masaki Matsusako,Yasuyuki Kurihara
DOI: https://doi.org/10.1136/bmjresp-2023-002249
2024-04-08
Abstract:Background: Diagnosing mediastinal tumours, including incidental lesions, using low-dose CT (LDCT) performed for lung cancer screening, is challenging. It often requires additional invasive and costly tests for proper characterisation and surgical planning. This indicates the need for a more efficient and patient-centred approach, suggesting a gap in the existing diagnostic methods and the potential for artificial intelligence technologies to address this gap. This study aimed to create a multimodal hybrid transformer model using the Vision Transformer that leverages LDCT features and clinical data to improve surgical decision-making for patients with incidentally detected mediastinal tumours. Methods: This retrospective study analysed patients with mediastinal tumours between 2010 and 2021. Patients eligible for surgery (n=30) were considered 'positive,' whereas those without tumour enlargement (n=32) were considered 'negative.' We developed a hybrid model combining a convolutional neural network with a transformer to integrate imaging and clinical data. The dataset was split in a 5:3:2 ratio for training, validation and testing. The model's efficacy was evaluated using a receiver operating characteristic (ROC) analysis across 25 iterations of random assignments and compared against conventional radiomics models and models excluding clinical data. Results: The multimodal hybrid model demonstrated a mean area under the curve (AUC) of 0.90, significantly outperforming the non-clinical data model (AUC=0.86, p=0.04) and radiomics models (random forest AUC=0.81, p=0.008; logistic regression AUC=0.77, p=0.004). Conclusion: Integrating clinical and LDCT data using a hybrid transformer model can improve surgical decision-making for mediastinal tumours, showing superiority over models lacking clinical data integration.
What problem does this paper attempt to address?