Unsupervised Domain Adaptation Approach for Vision-Based Semantic Understanding of Bridge Inspection Scenes Without Manual Annotations

Yasutaka Narazaki,Wendong Pang,Gaoang Wang,Wenhao Chai
DOI: https://doi.org/10.1061/jbenf2.beeng-6490
2024-01-01
Journal of Bridge Engineering
Abstract:Deep learning-based (DL) visual recognition algorithms are widely investigated to enhance the accuracy, efficiency, and objectivity of the bridge inspection process, which is largely manual today. These algorithms typically require a large amount of training data, which consists of images and corresponding annotations. The manual preparation of such data sets is time-consuming, and more automated data generation approaches that are aided by synthetic environments suffer from domain gaps, which result in poor performance in real-world tasks. This study investigates an unsupervised domain adaptation (UDA) approach for visual recognition in bridge inspection scenes to reduce and eventually eliminate the need for time-consuming and inaccurate manual image annotations. A state-of-the-art UDA framework, termed DAFormer, is applied to the synthetic source domain data with full annotations and real-world target domain data with no or partial annotations. The synthetic data set in this study is designed to correlate with real-world data by incorporating the relevant design standards and practices into the modeling step. Compared with the source-only supervised learning approach (which performed poorly on real-world data), the UDA improved the performance to a level close to the supervised learning that used real-world data with manual annotations (the Intersection over Union (IoU) difference is only 1.03%). Furthermore, the UDA approach outperformed the supervised learning that used target domain data if the small amount of annotated target domain data is mixed with the synthetic source domain data to guide the network's learning of patterns that only exist in the real-world environment (the IoU improvement was 5.03%). The UDA approach presented in this study facilitates the applications of DL-based visual recognition algorithms to bridge inspection tasks with limited manual effort.
What problem does this paper attempt to address?