Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection

Sen Nie,Zhuo Wang,Xinxin Wang,Kun He
2024-08-06
Abstract:Recent studies emphasize the crucial role of data augmentation in enhancing the performance of object detection models. However,existing methodologies often struggle to effectively harmonize dataset diversity with semantic <a class="link-external link-http" href="http://coordination.To" rel="external noopener nofollow">this http URL</a> bridge this gap, we introduce an innovative augmentation technique leveraging pre-trained conditional diffusion models to mediate this balance. Our approach encompasses the development of a Category Affinity Matrix, meticulously designed to enhance dataset diversity, and a Surrounding Region Alignment strategy, which ensures the preservation of semantic coordination in the augmented images. Extensive experimental evaluations confirm the efficacy of our method in enriching dataset diversity while seamlessly maintaining semantic coordination. Our method yields substantial average improvements of +1.4AP, +0.9AP, and +3.4AP over existing alternatives on three distinct object detection models, respectively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to maintain semantic harmony while enhancing the diversity of the dataset in object detection tasks. Existing data augmentation methods often struggle to find a balance between the two: some methods introduce minor modifications through geometric transformations and the like. Although they maintain the overall semantic consistency of the image, their diversity is limited; while other methods based on generative models can increase the diversity of the dataset, but they face challenges in maintaining the semantic harmony of the image. To overcome these limitations, the author proposes a data augmentation method based on the diffusion model, which achieves the goal through the following two key techniques: 1. **Category Affinity Matrix**: By calculating the visual and semantic similarities of different categories, a matrix is constructed to guide the generative model to select objects with an affinity relationship when replacing the original objects, thereby appropriately enhancing the diversity of the dataset. 2. **Surrounding Region Alignment**: By extracting information from the original diffusion process and combining it with the new diffusion process, the semantic integrity of the generated image is ensured, and the potential semantic disconnection problem between the generated object and the background is solved. The experimental results show that this method achieves an average performance improvement of +1.4AP, +0.9AP, and +3.4AP on three different object detection models respectively, demonstrating its effectiveness in increasing dataset diversity and maintaining semantic harmony. In addition, this method also performs well on specific categories and fine - grained datasets, with improvements of +3.6AP and +4.4AP respectively.