Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

Prakash Chandra Chhipa,Kanjar De,Meenakshi Subhash Chippa,Rajkumar Saini,Marcus Liwicki

2024-09-06

Abstract:The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Vision-Language Models (VLMs) have recently achieved groundbreaking results. VLM-based open-vocabulary object detection extends the capabilities of traditional object detection frameworks, enabling the recognition and classification of objects beyond predefined categories. Investigating OOD robustness in recent open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of the zero-shot capabilities of three recent open-vocabulary (OV) foundation object detection models: OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the robustness benchmarks COCO-O, COCO-DC, and COCO-C encompassing distribution shifts due to information loss, corruption, adversarial attacks, and geometrical deformation, highlighting the challenges of the model's robustness to foster the research for achieving robustness. Project page: <a class="link-external link-https" href="https://prakashchhipa.github.io/projects/ovod_robustness" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores the robustness challenges of Open-Vocabulary Object Detection (OV-OD) models in Out-Of-Distribution (OOD) scenarios. Specifically: 1. **Robustness Evaluation**: - The focus of the study is on evaluating the zero-shot capabilities of three state-of-the-art open-vocabulary foundational object detection models (OWL-ViT, YOLO World, and Grounding DINO) on different OOD benchmarks. - The experiments cover COCO-O, COCO-DC, and COCO-C benchmarks, which include distribution changes such as information loss, data corruption, adversarial attacks, and geometric transformations. 2. **Model Comparison and Analysis**: - Detailed experimental results compare the performance of these three models across different benchmarks, revealing their performance degradation when facing various distribution changes. - Special attention is given to the consistency and adaptability of these models under various conditions. 3. **Directions for Robustness Improvement**: - The paper highlights the shortcomings of open-vocabulary object detection models in terms of robustness and proposes future research directions to enhance the reliability and robustness of these models in real-world applications. Through this research, the paper aims to advance the field of open-vocabulary object detection and improve the performance of models when encountering unknown and complex environments.

Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts

OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

OOD-CV-v2 : An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

What Makes Good Open-Vocabulary Detector: A Disassembling Perspective

On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Unveiling the unseen: novel strategies for object detection beyond known distributions

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Open-Vocabulary Object Detection with an Open Corpus

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Retrieval-Augmented Open-Vocabulary Object Detection

MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

LOVD: Large-and-Open Vocabulary Object Detection

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data