Vision Transformers(ViT) Pretraining on 3D ABUS Image and Dual-CapsViT: Enhancing ViT Decoding Via Dual-Channel Dynamic Routing

Mingwang Xu,Wei Wang,Kuanquan Wang,Suyu Dong,Pengzhong Sun,Jinwei Sun,Gongning Luo
DOI: https://doi.org/10.1109/bibm58861.2023.10385848
2023-01-01
Abstract:Breast cancer continues to be a pressing global health concern, emphasizing the essential need for effective diagnostic techniques. Automated Breast Ultrasound Systems (ABUS) provide a promising advance in breast tumor detection, yet they require significant expertise in interpreting 3D ABUS images, a task fraught with distinctive challenges. Although Vision Transformers (ViT) display remarkable potential for image processing, their low inductive bias and significant data requirements pose obstacles, particularly in the data-constrained medical field. To mitigate these issues, we introduce a Mask-Recover strategy for pretraining Transformer models on 3D ABUS images, enhancing model adaptability and reducing the data demands of the ViT model. Moreover, recognizing the risk that ViTs’ average pooling approach may unintentionally mask small but vital features, we propose Dual-CapsViT, an inventive model combining Transformers and Capsule Networks. This integration affords efficient token routing while preserving fine-grained details. To reconcile potential inconsistencies between capsules and tokens, we engineer a novel dual-channel routing algorithm, strengthening the decoder’s performance. We benchmarked our models against well-known standards such as ResNet and ViT for classifying breast tumors in ABUS images. Our models exhibited superior performance, as evidenced by improved accuracy, specificity, and Area Under the Receiver Operating Characteristic Curve (AUC) metrics, thereby affirming Dual-CapsViT’s potential to enhance breast cancer diagnostics.
What problem does this paper attempt to address?