Abstract:Semantic segmentation could obtain the pixel level classification of surrounding environments which is an essential task for autonomous vehicles and mobile robots visual perception. Most existing semantic segmentation networks were focused on the visual perception of autonomous vehicles. Little attention is paid to the semantic segmentation for UAV (Unmanned Aerial Vehicle) visual perception, which is crucial to UAV autonomous flight and landing spot searching. Compared with views from autonomous vehicles, the UAV-based views were more challenging for the semantic segmentation task due to images captured by UAV containing large-scale variation of objects size caused by different altitude and angle. The existing semantic segmentation networks for the visual perception of autonomous vehicles are generally inadequate to effectively extract the representative features of UAV images which required contain context information and local information simultaneously. A cascade composite transformer-based semantic segmentation network is proposed in this study for UAV visual perception. A cascade composite encoder is designed which consists of three transformer-based feature extraction backbones and cascade fused low-level features, middle-level features and high-level features to achieve better feature representation capacity. The spatial enhanced transformer block is implemented as the basic feature extraction block of each backbone to make the extracted features contain context information of environments and local information of objects. A symmetric rhombus decoder is proposed to integrate multi-stage features and make fully utilise of middle stage features which contained abundance of useful information, thus accurately pixel level prediction could be obtained in this way. Ablation studies and comparison experiments for the proposed CCTseg have been conducted on two public UAV imagery datasets suitable for UAV autonomous flight and landing spot observing. Experimental results have demonstrated the effectiveness of the proposed network structure and the superiority of proposed network over other state-of-the-art methods for the semantic segmentation of UAV visual perception.

A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird’s-Eye-View Semantic Segmentation

A Cross-Scale Hierarchical Transformer with Correspondence-Augmented Attention for inferring Bird's-Eye-View Semantic Segmentation

SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation.

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semantic segmentation using cross-stage feature reweighting and efficient self-attention

Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation

A Bio-Inspired Visual Perception Transformer for Cross-Domain Semantic Segmentation of High-Resolution Remote Sensing Images

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Surrounding-aware representation prediction in Birds-Eye-View using transformers

Improving Bird’s Eye View Semantic Segmentation by Task Decomposition

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Case report: adverse granulomatous reaction (Granuloma formation) and pseudomonas superinfection after lip augmentation by the new filler DermaLive®

CCTseg: A cascade composite transformer semantic segmentation network for UAV visual perception

Representation Separation for Semantic Segmentation with Vision Transformers

Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation

Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation