Abstract:Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: (1) Noisy labels caused by the class-agnostic nature of SAM; (2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~ 20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.

Recalling Unknowns without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection

USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and Segment Anything Model

Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly Supervised Object Detection.

Unsupervised Recognition of Unknown Objects for Open-World Object Detection

DDOWOD: DiffusionDet for Open-World Object Detection

Detecting the open-world objects with the help of the Brain

Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models

Open World Object Detection: A Survey

SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

Annealing-Based Label-Transfer Learning for Open World Object Detection

OW-Adapter: Human-Assisted Open-World Object Detection with a Few Examples

UC-OWOD: Unknown-Classified Open World Object Detection

Text-Guided Unknown Pseudo-Labeling for Open-World Object Detection

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

UADet: A Remarkably Simple Yet Effective Uncertainty-Aware Open-Set Object Detection Framework

Universal Object Detection with Large Vision Model

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition

MOL: Towards Accurate Weakly Supervised Remote Sensing Object Detection Via Multi-view Noisy Learning

YOLOOC: YOLO-based Open-Class Incremental Object Detection with Novel Class Discovery

Addressing the Challenges of Open-World Object Detection

Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts