V-RoAst: A New Dataset for Visual Road Assessment

Natchapon Jongwiriyanurak,Zichao Zeng,June Moh Goo,Xinglei Wang,Ilya Ilyankou,Kerkritt Srirrongvikrai,Meihui Wang,James Haworth

2024-08-21

Abstract:Road traffic crashes cause millions of deaths annually and have a significant economic impact, particularly in low- and middle-income countries (LMICs). This paper presents an approach using Vision Language Models (VLMs) for road safety assessment, overcoming the limitations of traditional Convolutional Neural Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering for Road Assessment), with a real-world dataset. Our approach optimizes prompt engineering and evaluates advanced VLMs, including Gemini-1.5-flash and GPT-4o-mini. The models effectively examine attributes for road assessment. Using crowdsourced imagery from Mapillary, our scalable solution influentially estimates road safety levels. In addition, this approach is designed for local stakeholders who lack resources, as it does not require training data. It offers a cost-effective and automated methods for global road safety assessments, potentially saving lives and reducing economic burdens.

Computer Vision and Pattern Recognition,Artificial Intelligence,Emerging Technologies

What problem does this paper attempt to address?

The paper aims to address the challenges in road traffic safety assessment, particularly focusing on road safety evaluation issues in low- and middle-income countries (LMICs). Specifically, the objectives of the paper include: 1. **Developing new visual road assessment methods**: Utilizing Visual Language Models (VLMs) for road safety assessment to overcome the limitations of traditional Convolutional Neural Networks (CNNs) in this field. 2. **Introducing new tasks and datasets**: Proposing a new task named V-RoAst (Visual Road Assessment Question Answering) and constructing a real-world dataset for this task. 3. **Optimizing prompt engineering**: Enhancing the performance of VLMs in road attribute recognition through optimized prompts. 4. **Evaluating advanced VLMs**: Assessing advanced VLMs such as Gemini-1.5-flash and GPT-4o-mini to test their effectiveness in road attribute detection tasks. 5. **Providing cost-effective solutions**: Proposing a scalable method that uses crowdsourced image data from platforms like Mapillary to estimate road safety levels. This method does not require additional training data, making it highly practical for stakeholders in resource-limited areas. 6. **Supporting global road safety assessment**: Offering an automated and cost-effective method for global road safety assessment, which helps reduce the loss of life and economic burden caused by traffic accidents. In summary, this research aims to improve global road safety, especially in LMICs, by leveraging the capabilities of VLMs to automatically detect and classify key attributes affecting road safety.

V-RoAst: A New Dataset for Visual Road Assessment

A Multimodal Data-Driven Approach for Driving Risk Assessment

A system of vision sensor based deep neural networks for complex driving scene analysis in support of crash risk assessment and prevention

A Computer Vision-assisted Approach to Automated Real-Time Road Infrastructure Management

VLM-Auto: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes

FARSA: Fully Automated Roadway Safety Assessment

Revolutionizing Urban Safety Perception Assessments: Integrating Multimodal Large Language Models with Street View Images

Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving

Reading Between the Lanes: Text VideoQA on the Road

BAAI-VANJEE Roadside Dataset: Towards the Connected Automated Vehicle Highway technologies in Challenging Environments of China

Semantic Understanding of Traffic Scenes with Large Vision Language Models

Mapping road safety features from streetview imagery: A deep learning approach

Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems

Vision-Based Accident Anticipation and Detection Using Deep Learning

VATLD: A Visual Analytics System to Assess, Understand and Improve Traffic Light Detection

Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads

ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding

Road Damages Detection and Classification with YOLOv7

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

Road Feature Detection for Advance Driver Assistance System Using Deep Learning

DriveLM: Driving with Graph Visual Question Answering