To-Former: semantic segmentation of transparent object with edge-enhanced transformer

Jiawei Chen,Wen Su,Mengjiao Ge,Ye He,Jun Yu
DOI: https://doi.org/10.1007/s00371-024-03494-0
IF: 2.835
2024-06-02
The Visual Computer
Abstract:Transparent objects are widely present in our daily environment. The precise semantic segmentation of transparent objects is crucial for enhancing the perception and comprehension of visual scenes in computer vision systems. Unlike conventional objects, transparent objects are difficult to distinguish from the background or other objects due to blending, color changes, and blurred boundaries. Accurate segmentation requires global context, multi-scale edge information, and long-distance pixel dependencies. According to the above facts, we propose a simple and efficient semantic segmentation architecture for transparent object known as transparent object transformer (To-Former). We introduce an edge-enhanced multi-head self-attention mechanism that incorporates multi-scale separable convolution and pooling. It is specially designed to efficiently extract both edge and global features from transparent objects. We propose the convolutional feed-forward network to enhance features at multiple scales to attain a superior feature representation while reducing the computational complexity. The To-Former constructs a reinforced decoder, integrating the feature channel enhancement module. It has the potential to significantly enhance feature channels, thereby improving the feature fusion effect and the segmentation effectiveness of different feature categories. The benchmark experiments show that our To-Former can achieve consistent improvements with fewer parameters over several typical and state-of-the-art segmentation baseline models on challenging public benchmarks. It is even impressive in the boundary positioning of transparent objects in simple and complex scenes. The codes are available at https://github.com/cehn-jiawei/To-Former
computer science, software engineering
What problem does this paper attempt to address?