Abstract:Detecting both known and unknown objects is a fundamental skill for robot manipulation in unstructured environments. Open-set object detection (OSOD) is a promising direction to handle the problem consisting of two subtasks: objects and background separation, and open-set object classification. In this paper, we present Openset RCNN to address the challenging OSOD. To disambiguate unknown objects and background in the first subtask, we propose to use classification-free region proposal network (CF-RPN) which estimates the objectness score of each region purely using cues from object's location and shape preventing overfitting to the training categories. To identify unknown objects in the second subtask, we propose to represent them using the complementary region of known categories in a latent space which is accomplished by a prototype learning network (PLN). PLN performs instance-level contrastive learning to encode proposals to a latent space and builds a compact region centering with a prototype for each known category. Further, we note that the detection performance of unknown objects can not be unbiasedly evaluated on the situation that commonly used object detection datasets are not fully annotated. Thus, a new benchmark is introduced by reorganizing GraspNet-1billion, a robotic grasp pose detection dataset with complete annotation. Extensive experiments demonstrate the merits of our method. We finally show that our Openset RCNN can endow the robot with an open-set perception ability to support robotic rearrangement tasks in cluttered environments. More details can be found in <a class="link-external link-https" href="https://sites.google.com/view/openset-rcnn/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to address the challenges in **Open - Set Object Detection (OSOD)**. Specifically, OSOD refers to the ability to not only detect known objects but also recognize unknown objects when robots perform operational tasks in unstructured environments. Traditional close - set object detectors can only detect object classes that appear in the training set, but in the real world, the number of object classes is infinite, so close - set detectors cannot meet the requirements of robots working in unstructured environments. The main contributions of the paper include: 1. **Proposing a new method - Openset RCNN**: This method combines the Classification - free Region Proposal Network (CF - RPN) and the Prototype Learning Network (PLN) based on instance - level contrastive learning to enhance the generalization and recognition ability for unknown objects. 2. **Introducing a new benchmark dataset**: By reorganizing the fully - annotated dataset GraspNet - 1billion, a new benchmark suitable for evaluating OSOD performance is created, which solves the problem of incomplete annotation in common object detection datasets. 3. **Verifying the effectiveness of the method through extensive experiments**: The experimental results show that this method has significant advantages in handling robot rearrangement tasks in cluttered environments. ### Specific problem description OSOD consists of two subtasks: 1. **Separating objects from the background**: Distinguishing objects in an image from the background. 2. **Open - set object classification**: Recognizing known and unknown objects. The main challenges include: - **Over - fitting problem**: Since common object detection datasets (such as PASCAL VOC and COCO) do not fully annotate all objects, the model is prone to over - fit to the training classes. - **Evaluation bias**: In order to fairly evaluate OSOD performance, especially the detection performance for unknown objects, a fully - annotated dataset needs to be used. ### Solution overview To solve the above problems, the paper proposes the following solutions: - **CF - RPN**: By using only object location and shape cues to estimate the objectness score of each region, over - fitting to the training classes is avoided. - **PLN**: Through instance - level contrastive learning, a compact representation (called a prototype) is constructed for each known class in the latent space, and these prototypes are used to recognize known and unknown objects. Through these innovations, the method proposed in the paper can more effectively support the perception and operation tasks of robots in unstructured environments.

Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning

Convolutional Prototype Network for Open Set Recognition

Open-set object detection: towards unified problem formulation and benchmarking

Towards Open-set Camera 3D Object Detection

OpenSlot: Mixed Open-set Recognition with Object-centric Learning

Open-Set 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning

OPODet: Toward Open World Potential Oriented Object Detection in Remote Sensing Images

P-ODN: Prototype based Open Deep Network for Open Set Recognition

Rectifying Open-set Object Detection: A Taxonomy, Practical Applications, and Proper Evaluation

OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery

Addressing the Challenges of Open-World Object Detection

Object Detectors in the Open Environment: Challenges, Solutions, and Outlook

Object Detection in Remote Sensing Imagery Based on Prototype Learning Network With Proposal Relation

Towards Evidential and Class Separable Open Set Object Detection

Synergetic proto-pull and reciprocal points for open set recognition

Enhancing Open-Set Object Detection Via Uncertainty-Boxes Identification.

Unsupervised Recognition of Unknown Objects for Open-World Object Detection

Contrastive Open Set Recognition

Open-world object detection: A solution based on reselection mechanism and feature disentanglement

UADet: A Remarkably Simple Yet Effective Uncertainty-Aware Open-Set Object Detection Framework

Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning