Abstract:Pedestrian detection plays a crucial role in autonomous driving by identifying the position, size, orientation, and dynamic features of pedestrians in images or videos, assisting autonomous vehicles in making better decisions and controls. It’s worth noting that the performance of pedestrian detection models largely depends on the quality and diversity of available training data. Current datasets for autonomous driving have limitations in terms of diversity, scale, and quality. In recent years, numerous studies have proposed the use of data augmentation strategies to expand the coverage of datasets, aiming to maximize the utilization of existing training data. However, these data augmentation methods often overlook the diversity of data scenarios. To overcome this challenge, in this paper, we propose a more comprehensive method for data augmentation, based on image descriptions and diffusion models. This method aims to cover a wider range of scene variations, including different weather conditions and lighting situations. We have designed a classifier to select data samples for augmentation, followed by extracting visual features based on image captions and converting them into high-level semantic information as textual descriptions for the corresponding samples. Finally, we utilize diffusion models to generate new variants. Additionally, we have designed three modification patterns to increase diversity in aspects such as weather conditions, lighting, and pedestrian poses within the data. We conducted extensive experiments on the KITTI dataset and in real-world environments, demonstrating that our proposed method significantly enhances the performance of pedestrian detection models in complex scenarios. This meticulous consideration of data augmentation will notably enhance the applicability and robustness of pedestrian detection models in actual autonomous driving scenarios.

Data Augmentation in Human-Centric Vision

A Comprehensive Survey on Data Augmentation

Image Data Augmentation for Deep Learning: A Survey

A Brief Survey on Semantic-preserving Data Augmentation

Image data augmentation techniques based on deep learning: A survey

Facial Landmarks Based Region-Level Data Augmentation for Gaze Estimation

A survey of synthetic data augmentation methods in computer vision

Improving the Robustness of Pedestrian Detection in Autonomous Driving with Generative Data Augmentation

A survey on Image Data Augmentation for Deep Learning

A Survey on Data Augmentation in Large Model Era

A Survey on Data Augmentation Methods Based on GAN in Computer Vision

PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Frontiers and Developments of Data Augmentation for Image: from Unlearnable to Learnable

Understanding Data Augmentation from a Robustness Perspective

A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classification Tasks

Data Augmentation in Classification and Segmentation: A Survey and New Strategies

A Data Augmentation Method Based on Multi-Modal Image Fusion for Detection and Segmentation

Data Augmentation using Generative-AI