RFIA-Net: Rich CNN-transformer Network Based on Asymmetric Fusion Feature Aggregation to Classify Stage I Multimodality Oesophageal Cancer Images

Zhicheng Zhou,Gang Sun,Long Yu,Shengwei Tian,Guangli Xiao,Junwen Wang,Shaofeng Zhou
DOI: https://doi.org/10.1016/j.engappai.2022.105703
IF: 8
2023-01-01
Engineering Applications of Artificial Intelligence
Abstract:Endoscopic images of oesophageal cancer have the characteristics of rich colours; furthermore, the small lesions are similar to the oesophageal wall tissue, and the pathological images have the characteristics of various staining methods, different shapes, and rich texture details. Aiming at the above characteristics and combining the unique advantages of convolutional architectures and the development of vision transformers in computer vision tasks, in this paper, for the stage I multimodality oesophageal cancer image classification task, we design an efficient hybrid architecture that leverages the local modelling capabilities and powerful semantic feature extraction capabilities of convolutional neural networks and the ability of transformers to extract global information. And combined with the structural reparameterization strategy to further improve the model expression. Specifically, our architecture consists of a feature extraction module and a feature enhancement module. In the feature enhancement module, we supplement the semantic information of each branch by continuously exchanging information between the two branches, which further improves the performance of the network. Furthermore, we propose an asymmetric fusion module that allows features to further enhance the feature relationships between different branches through spatial translation and channel swapping. Compared with networks such as ResNet-18, our proposed method achieves the best results for oesophageal cancer image classification on both tasks on the XJMU-XJU stage I multimodal oesophageal cancer dataset. The proposed method achieved an AUC of 0.9973 and an ACC of 0.9902 on the staging task and achieved a recall of 0.9742 and an ACC of 0.9750 on the differentiation task.
What problem does this paper attempt to address?