Semantic scene segmentation for indoor autonomous vision systems: leveraging an enhanced and efficient U-NET architecture

Tran, Hoang N.
DOI: https://doi.org/10.1007/s11042-024-19302-9
IF: 2.577
2024-05-10
Multimedia Tools and Applications
Abstract:Advancements in indoor autonomous vision systems (IAVSs) underscore the need to bridge the gap between their capabilities and human perception of real-world scenes. This paper introduces a novel semantic segmentation framework called EADFL-UNet, based on the U-Net architecture. It incorporates EfficientNetB3 as the encoder for improved feature extraction and employs a super attention block, integrating attention gate (AG) and spatial and channel SE (scSE) mechanisms, to refine segmentation by prioritizing relevant areas and features. Additionally, a modified loss function merging Diceloss (DL) and Class-Balanced Weights Focalloss (CBW-FL) addresses data imbalance, especially in liver segmentation and indoor environments. Evaluation of the NYUv2 Dataset and augmented datasets compared the performance of EADFL-UNet with various U-Net encoder configurations, demonstrating its superiority. Further analysis focused on integrating attention blocks at different stages of the U-Net architecture, revealing significant improvements in segmentation accuracy. The proposed method, even without depth information, outperforms conventional structures by 10% in mean Intersection over Union (mIOU), showing promise for applications in diverse IAVSs such as robotic vision, GPS, sports, and security.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?