Semantic segmentation of oblique UAV video based on ConvLSTM in complex urban area
DOI: https://doi.org/10.1007/s12145-024-01355-x
2024-06-09
Earth Science Informatics
Abstract:Semantic segmentation of UAV data has been one of the research hotspots in recent years. Many applications, such as the aerial mapping of urban–rural areas, the automatic extraction of roads/buildings, etc, require accurate and efficient segmentation algorithms. The proposed method implements a deep learning framework combining SegNet encoder-decoder architecture and convolutional long short-term memory (ConvLSTM) neural network for semantic oblique video segmentation of urban scenes. The SegNet, as a frame-based semantic segmentation method, along with the ConvLSTM, learns the spatial and temporal information and provides the final video segmentation results. The proposed method was evaluated over two UAV oblique video datasets including, UAVid and Varied Drone Dataset (VDD). The obtained results were compared with two neural networks, U-Net and FCN, in combination with Conv-RNN architecture. The proposed method significantly improved the segmentation results of two video datasets by extracting spatial–temporal features. The proposed SegNet-ConvLSTM architecture achieved the best results of both UAVid and VDD video sets with 81.72% and 87.03% accuracy, respectively. The UNet-ConvLSTM architecture, with 77.57% and 84.42% accuracy, and FCN-ConvLSTM architecture, with 73.79% and 82.51% accuracy in UAVid2020 and VDD datasets, provided weaker semantic segmentation. Moreover, the SegNet-ConvLSTM architecture, using NVIDIA Tesla T4 GPU, segments each video sequence with an average inference time of approximately 33 ms per frame, and it can be applied in real-time applications.
geosciences, multidisciplinary,computer science, interdisciplinary applications