Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning

Zhongxiang Zhou,Yifei Yang,Yue Wang,Rong Xiong
DOI: https://doi.org/10.1109/LRA.2023.3242169
2023-12-04
Abstract:Detecting both known and unknown objects is a fundamental skill for robot manipulation in unstructured environments. Open-set object detection (OSOD) is a promising direction to handle the problem consisting of two subtasks: objects and background separation, and open-set object classification. In this paper, we present Openset RCNN to address the challenging OSOD. To disambiguate unknown objects and background in the first subtask, we propose to use classification-free region proposal network (CF-RPN) which estimates the objectness score of each region purely using cues from object's location and shape preventing overfitting to the training categories. To identify unknown objects in the second subtask, we propose to represent them using the complementary region of known categories in a latent space which is accomplished by a prototype learning network (PLN). PLN performs instance-level contrastive learning to encode proposals to a latent space and builds a compact region centering with a prototype for each known category. Further, we note that the detection performance of unknown objects can not be unbiasedly evaluated on the situation that commonly used object detection datasets are not fully annotated. Thus, a new benchmark is introduced by reorganizing GraspNet-1billion, a robotic grasp pose detection dataset with complete annotation. Extensive experiments demonstrate the merits of our method. We finally show that our Openset RCNN can endow the robot with an open-set perception ability to support robotic rearrangement tasks in cluttered environments. More details can be found in <a class="link-external link-https" href="https://sites.google.com/view/openset-rcnn/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges in **Open - Set Object Detection (OSOD)**. Specifically, OSOD refers to the ability to not only detect known objects but also recognize unknown objects when robots perform operational tasks in unstructured environments. Traditional close - set object detectors can only detect object classes that appear in the training set, but in the real world, the number of object classes is infinite, so close - set detectors cannot meet the requirements of robots working in unstructured environments. The main contributions of the paper include: 1. **Proposing a new method - Openset RCNN**: This method combines the Classification - free Region Proposal Network (CF - RPN) and the Prototype Learning Network (PLN) based on instance - level contrastive learning to enhance the generalization and recognition ability for unknown objects. 2. **Introducing a new benchmark dataset**: By reorganizing the fully - annotated dataset GraspNet - 1billion, a new benchmark suitable for evaluating OSOD performance is created, which solves the problem of incomplete annotation in common object detection datasets. 3. **Verifying the effectiveness of the method through extensive experiments**: The experimental results show that this method has significant advantages in handling robot rearrangement tasks in cluttered environments. ### Specific problem description OSOD consists of two subtasks: 1. **Separating objects from the background**: Distinguishing objects in an image from the background. 2. **Open - set object classification**: Recognizing known and unknown objects. The main challenges include: - **Over - fitting problem**: Since common object detection datasets (such as PASCAL VOC and COCO) do not fully annotate all objects, the model is prone to over - fit to the training classes. - **Evaluation bias**: In order to fairly evaluate OSOD performance, especially the detection performance for unknown objects, a fully - annotated dataset needs to be used. ### Solution overview To solve the above problems, the paper proposes the following solutions: - **CF - RPN**: By using only object location and shape cues to estimate the objectness score of each region, over - fitting to the training classes is avoided. - **PLN**: Through instance - level contrastive learning, a compact representation (called a prototype) is constructed for each known class in the latent space, and these prototypes are used to recognize known and unknown objects. Through these innovations, the method proposed in the paper can more effectively support the perception and operation tasks of robots in unstructured environments.