Multisemantic Level Patch Merger Vision Transformer for Diagnosis of Pneumonia

Zheng Jiang,Liang Chen
DOI: https://doi.org/10.1155/2022/7852958
IF: 2.809
2022-06-22
Computational and Mathematical Methods in Medicine
Abstract:The most popular test for pneumonia, a serious health threat to children, is chest X-ray imaging. However, the diagnosis of pneumonia relies on the expertise of experienced radiologists, and the scarcity of medical resources has forced us to conduct research on CAD (computer-aided diagnosis). In this study, we propose MP-ViT, the Multisemantic Level Patch Merger Vision Transformer, to achieve automatic diagnosis of pneumonia in chest X-ray images. We introduce Patch Merger to reduce the computational cost of ViT. Meanwhile, the intermediate results calculated by Patch Merger participate in the final classification in a concise way, so as to make full use of the intermediate information of the high-level semantic space to learn from local to overall and to avoid information loss caused by Patch Merger. We conducted experiments on a dataset with 3,883 chest X-ray images described as pneumonia and 1,349 images labeled as normal, and the results show that even without pretraining ViT on a large dataset, our model can achieve the accuracy of 0.91, the precision of 0.92, the recall of 0.89, and the F 1 -score of 0.90, which is better than Patch Merger on a small dataset. The model can provide CAD for physicians and improve diagnostic reliability.
mathematical & computational biology
What problem does this paper attempt to address?