V-RoAst: A New Dataset for Visual Road Assessment

Natchapon Jongwiriyanurak,Zichao Zeng,June Moh Goo,Xinglei Wang,Ilya Ilyankou,Kerkritt Srirrongvikrai,Meihui Wang,James Haworth
2024-08-21
Abstract:Road traffic crashes cause millions of deaths annually and have a significant economic impact, particularly in low- and middle-income countries (LMICs). This paper presents an approach using Vision Language Models (VLMs) for road safety assessment, overcoming the limitations of traditional Convolutional Neural Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering for Road Assessment), with a real-world dataset. Our approach optimizes prompt engineering and evaluates advanced VLMs, including Gemini-1.5-flash and GPT-4o-mini. The models effectively examine attributes for road assessment. Using crowdsourced imagery from Mapillary, our scalable solution influentially estimates road safety levels. In addition, this approach is designed for local stakeholders who lack resources, as it does not require training data. It offers a cost-effective and automated methods for global road safety assessments, potentially saving lives and reducing economic burdens.
Computer Vision and Pattern Recognition,Artificial Intelligence,Emerging Technologies
What problem does this paper attempt to address?
The paper aims to address the challenges in road traffic safety assessment, particularly focusing on road safety evaluation issues in low- and middle-income countries (LMICs). Specifically, the objectives of the paper include: 1. **Developing new visual road assessment methods**: Utilizing Visual Language Models (VLMs) for road safety assessment to overcome the limitations of traditional Convolutional Neural Networks (CNNs) in this field. 2. **Introducing new tasks and datasets**: Proposing a new task named V-RoAst (Visual Road Assessment Question Answering) and constructing a real-world dataset for this task. 3. **Optimizing prompt engineering**: Enhancing the performance of VLMs in road attribute recognition through optimized prompts. 4. **Evaluating advanced VLMs**: Assessing advanced VLMs such as Gemini-1.5-flash and GPT-4o-mini to test their effectiveness in road attribute detection tasks. 5. **Providing cost-effective solutions**: Proposing a scalable method that uses crowdsourced image data from platforms like Mapillary to estimate road safety levels. This method does not require additional training data, making it highly practical for stakeholders in resource-limited areas. 6. **Supporting global road safety assessment**: Offering an automated and cost-effective method for global road safety assessment, which helps reduce the loss of life and economic burden caused by traffic accidents. In summary, this research aims to improve global road safety, especially in LMICs, by leveraging the capabilities of VLMs to automatically detect and classify key attributes affecting road safety.