Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

Prakash Chandra Chhipa,Kanjar De,Meenakshi Subhash Chippa,Rajkumar Saini,Marcus Liwicki
2024-09-06
Abstract:The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: <a class="link-external link-https" href="https://prakashchhipa.github.io/projects/ovod_robustness" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the robustness challenges of Open-Vocabulary Object Detection (OV-OD) models in Out-Of-Distribution (OOD) scenarios. Specifically: 1. **Robustness Evaluation**: - The focus of the study is on evaluating the zero-shot capabilities of three state-of-the-art open-vocabulary foundational object detection models (OWL-ViT, YOLO World, and Grounding DINO) on different OOD benchmarks. - The experiments cover COCO-O, COCO-DC, and COCO-C benchmarks, which include distribution changes such as information loss, data corruption, adversarial attacks, and geometric transformations. 2. **Model Comparison and Analysis**: - Detailed experimental results compare the performance of these three models across different benchmarks, revealing their performance degradation when facing various distribution changes. - Special attention is given to the consistency and adaptability of these models under various conditions. 3. **Directions for Robustness Improvement**: - The paper highlights the shortcomings of open-vocabulary object detection models in terms of robustness and proposes future research directions to enhance the reliability and robustness of these models in real-world applications. Through this research, the paper aims to advance the field of open-vocabulary object detection and improve the performance of models when encountering unknown and complex environments.