Abstract:Deep learning-based (DL) visual recognition algorithms are widely investigated to enhance the accuracy, efficiency, and objectivity of the bridge inspection process, which is largely manual today. These algorithms typically require a large amount of training data, which consists of images and corresponding annotations. The manual preparation of such data sets is time-consuming, and more automated data generation approaches that are aided by synthetic environments suffer from domain gaps, which result in poor performance in real-world tasks. This study investigates an unsupervised domain adaptation (UDA) approach for visual recognition in bridge inspection scenes to reduce and eventually eliminate the need for time-consuming and inaccurate manual image annotations. A state-of-the-art UDA framework, termed DAFormer, is applied to the synthetic source domain data with full annotations and real-world target domain data with no or partial annotations. The synthetic data set in this study is designed to correlate with real-world data by incorporating the relevant design standards and practices into the modeling step. Compared with the source-only supervised learning approach (which performed poorly on real-world data), the UDA improved the performance to a level close to the supervised learning that used real-world data with manual annotations (the Intersection over Union (IoU) difference is only 1.03%). Furthermore, the UDA approach outperformed the supervised learning that used target domain data if the small amount of annotated target domain data is mixed with the synthetic source domain data to guide the network's learning of patterns that only exist in the real-world environment (the IoU improvement was 5.03%). The UDA approach presented in this study facilitates the applications of DL-based visual recognition algorithms to bridge inspection tasks with limited manual effort.

Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors

Unsupervised Domain Adaptation Approach for Vision-Based Semantic Understanding of Bridge Inspection Scenes Without Manual Annotations

TextNeRF: A Novel Scene-Text Image Synthesis Method Based on Neural Radiance Fields

Real-Aug: Realistic Scene Synthesis for LiDAR Augmentation in 3D Object Detection

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Scene Text Synthesis for Efficient and Effective Deep Network Training

UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Decoder Pre-Training with only Text for Scene Text Recognition

3D Vision and Language Pretraining with Large-Scale Synthetic Data

Increasing the Robustness of Deep Learning Models for Object Segmentation: A Framework for Blending Automatically Annotated Real and Synthetic Data.

Text Recognition in Real Scenarios with a Few Labeled Samples

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes

Text Enhancement Network for Cross-Domain Scene Text Detection

Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology

Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-Shaped Scene Text Detection

Training-free Composite Scene Generation for Layout-to-Image Synthesis

RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection