Lightweight Semantic Segmentation Network for Semantic Scene Understanding on Low-Compute Devices

James Weiland,H. Son
DOI: https://doi.org/10.1109/IROS55552.2023.10342110
2023-10-01
Abstract:Semantic scene understanding is beneficial for mobile robots. Semantic information obtained through onboard cameras can improve robots' navigation performance. However, obtaining semantic information on small mobile robots with constrained power and computation resources is challenging. We propose a new lightweight convolution neural network comparable to previous semantic segmentation algorithms for mobile applications. Our network achieved 73.06% on the Cityscapes validation set and 71.8% on the Cityscapes test set. Our model runs at 116 fps with $\mathbf{1024\mathrm{x}2048}$, 172 fps with $1024\mathrm{x}1024$, and 175 fps with $720\mathrm{x}960$ on NVIDIA GTX 1080. We analyze a model size, which is defined as the summation of the number of floating operations and the number of parameters. The smaller model size enables tiny mobile robot systems that should operate multiple tasks simultaneously to work efficiently. Our model has the smallest model size compared to the real-time semantic segmentation convolution neural networks ranked on Cityscapes real-time benchmark and other high performing, lightweight convolution neural networks. On the Camvid test set, our model achieved a mIoU of 73.29% with Cityscapes pre-training, which outperformed the accuracy of other lightweight convolution neural networks. For mobile applicability, we measured frame-per-second on different low-compute devices. Our model operates 35 fps on Jetson Xavier AGX, 21 FPS on Jetson Xavier NX, and 14 FPS on a ROS ASUS gaming phone. $1024\mathrm{x}2048$ resolution is used for the Jetson devices, and $512\mathrm{x}512$ size is utilized for the measurement on the phone. Our network did not use extra datasets such as ImageNet, Coarse Cityscapes, and Mapillary. Additionally, we did not use TensorRT to achieve fast inference speed. Compared to other real-time and lightweight CNNs, our model achieved significantly more efficiency while balancing accuracy, inference speed, and model size.
Computer Science,Engineering
What problem does this paper attempt to address?