Abstract:Semantic segmentation of microscopy cell images by deep learning is a significant technique. We considered that the Transformers, which have recently outperformed CNNs in image recognition, could also be improved and developed for cell image segmentation. Transformers tend to focus more on contextual information than on detailed information. This tendency leads to a lack of detailed information for segmentation. Therefore, to supplement or reinforce the missing detailed information, we hypothesized that feedback processing in the human visual cortex should be effective. Our proposed Feedback Former is a novel architecture for semantic segmentation, in which Transformers is used as an encoder and has a feedback processing mechanism. Feature maps with detailed information are fed back to the lower layers from near the output of the model to compensate for the lack of detailed information which is the weakness of Transformers and improve the segmentation accuracy. By experiments on three cell image datasets, we confirmed that our method surpasses methods without feedback, demonstrating its superior accuracy in cell image segmentation. Our method achieved higher segmentation accuracy while consuming less computational cost than conventional feedback approaches. Moreover, our method offered superior precision without simply increasing the model size of Transformer encoder, demonstrating higher accuracy with lower computational cost.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy of semantic segmentation of cell images. Specifically, the authors note that the Transformer model performs well in image recognition tasks, but has deficiencies in cell image segmentation tasks, especially in capturing detailed information. This is because Transformers are more inclined to focus on contextual information and ignore detailed information, which is a significant weakness in cell image segmentation tasks that require high precision. To solve this problem, the authors propose a new architecture - Feedback Former, which combines the Transformer and a feedback processing module inspired by the feedback processing mechanism of the human visual cortex. By feeding back the feature maps containing detailed information from the part close to the output to the lower level, the detailed information lacking in Transformers is supplemented, thereby improving the accuracy of segmentation. ### Main problems and solutions 1. **Problems**: - The Transformer model performs poorly in cell image segmentation tasks, especially in capturing detailed structural information (such as tiny cell structures). - Traditional methods mainly rely on CNNs to obtain local information, but these methods may not be able to fully utilize the advantages of Transformers in capturing contextual information. 2. **Solutions**: - A new architecture - Feedback Former is proposed, which uses the Transformer as an encoder and introduces a feedback processing mechanism. - The feedback processing mechanism draws on the working principle of the human visual cortex, feeding back the detailed information in the high - level feature maps to the low - level to make up for the deficiency of Transformers in detailed information. - A lightweight feedback module (Lite Feedback Module) is introduced, which can efficiently extract and enhance important information in the feature maps and pass it to the next round of processing. ### Experimental results By conducting experiments on three different cell image datasets, the authors verified the effectiveness of Feedback Former. The experimental results show that Feedback Former achieved higher segmentation accuracy than traditional methods on all three datasets. Especially on the iRPE dataset, the accuracy rate increased by 4.54%. In addition, compared with traditional feedback processing methods, Feedback Former not only improves accuracy but also reduces computational cost. ### Summary The main contribution of this paper is to propose a new architecture - Feedback Former. By combining the Transformer and the feedback processing mechanism, it effectively solves the problem of Transformers lacking detailed information in cell image segmentation tasks, thereby significantly improving the accuracy of segmentation.

Accuracy Improvement of Cell Image Segmentation Using Feedback Former

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

Feedback Attention for Cell Image Segmentation

ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation

H2Former: An Efficient Hierarchical Hybrid Transformer for Medical Image Segmentation

Semantic segmentation using cross-stage feature reweighting and efficient self-attention

CardiacSegFormer: Transformer for Semantic Segmentation of Cardiac Images.

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

MP-Former: Mask-Piloted Transformer for Image Segmentation

HD-Former: A hierarchical dependency Transformer for medical image segmentation

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

Feedback Convolutional Neural Network for Visual Localization and Segmentation

BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient Medical Image Segmentation

VSmTrans: A Hybrid Paradigm Integrating Self-attention and Convolution for 3D Medical Image Segmentation

ConvFormer: Combining CNN and Transformer for Medical Image Segmentation

Enhancing left ventricular segmentation in echocardiography with a modified mixed attention mechanism in SegFormer architecture

Medical Image Segmentation Algorithm Based on Feedback Mechanism CNN

Left Ventricle Segmentation in Echocardiography with Transformer