Laformer: Vision Transformer for Panoramic Image Semantic Segmentation

Zheng Yuan,Junhua Wang,Yuxin Lv,Ding Wang,Yi Fang
DOI: https://doi.org/10.1109/lsp.2023.3337716
2023-12-12
IEEE Signal Processing Letters
Abstract:Recent years have seen great advances in the area of semantic segmentation. However, general methods are targeted at pinhole images and tend to underperform when directly adopted to panoramic images. And with the wide applications of panoramic cameras, it is important to develop feasible approaches to train segmentation models for their real-time applications. To address this problem, we propose a novel method using self-training and achieve comparable results on DensePASS dataset. Namely, we propose a deformable merge module tailored for panoramic images by efficiently and accurately incorporating features of different levels. We design a novel prototype adaptation term that aids the model to better learn the class-wise feature embeddings of distorted objects. Finally, we use a simple and valid evaluation method to achieve real-time and improved inference performance. All combined, we can reach 58.27% of mIoU scores on DensePASS dataset and achieve new state of the art results.
engineering, electrical & electronic
What problem does this paper attempt to address?