Abstract:Deep learning-based (DL) visual recognition algorithms are widely investigated to enhance the accuracy, efficiency, and objectivity of the bridge inspection process, which is largely manual today. These algorithms typically require a large amount of training data, which consists of images and corresponding annotations. The manual preparation of such data sets is time-consuming, and more automated data generation approaches that are aided by synthetic environments suffer from domain gaps, which result in poor performance in real-world tasks. This study investigates an unsupervised domain adaptation (UDA) approach for visual recognition in bridge inspection scenes to reduce and eventually eliminate the need for time-consuming and inaccurate manual image annotations. A state-of-the-art UDA framework, termed DAFormer, is applied to the synthetic source domain data with full annotations and real-world target domain data with no or partial annotations. The synthetic data set in this study is designed to correlate with real-world data by incorporating the relevant design standards and practices into the modeling step. Compared with the source-only supervised learning approach (which performed poorly on real-world data), the UDA improved the performance to a level close to the supervised learning that used real-world data with manual annotations (the Intersection over Union (IoU) difference is only 1.03%). Furthermore, the UDA approach outperformed the supervised learning that used target domain data if the small amount of annotated target domain data is mixed with the synthetic source domain data to guide the network's learning of patterns that only exist in the real-world environment (the IoU improvement was 5.03%). The UDA approach presented in this study facilitates the applications of DL-based visual recognition algorithms to bridge inspection tasks with limited manual effort.

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

Simple and Effective Visual Question Answering in a Single Modality

Deep learning-based bridge damage cause estimation from multiple images using visual question answering

Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

Bridging the Gap Between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA

Unsupervised Domain Adaptation Approach for Vision-Based Semantic Understanding of Bridge Inspection Scenes Without Manual Annotations

Visual Question Answering As Reading Comprehension

Bridging the Cross-Modality Semantic Gap in Visual Question Answering

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

Visual Question Answering in the Medical Domain

Medical visual question answering based on question-type reasoning and semantic space constraint

Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks

Multilevel Structural Components Detection and Segmentation toward Computer Vision-Based Bridge Inspection

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

Visual question answering: A survey of methods and datasets

AI-VQA: Visual Question Answering based on Agent Interaction with Interpretability

Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training

Visual Question Answering based Educational Tool for Medical Students using Cross-ViT