CRNet: Context-guided Reasoning Network for Detecting Hard Objects

Jiaxu Leng,Yiran Liu,Xinbo Gao,Zhihui Wang
DOI: https://doi.org/10.1109/tmm.2023.3315558
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Recent studies have shown impressive performance in object detection. However, most current detectors only explore the appearance feature to locate and classify objects but disregard or underestimate the valuable contextual information in the image, which limits the detection performance for those hard objects, such as small objects, occluded objects, blurred objects, etc. In this paper, we instead seek to build a novel context modeling framework and conduct more effective context reasoning for object detection. Specifically, we design a Context-guided Reasoning Network (CRNet) to explore the relationships between objects and use easy detected objects to help understand hard ones. In our CRNet, an image is modeled as a graph and local features of objects are viewed as nodes of the graph to learn the relationships between objects. By passing contextual information in the built graph, the features of hard objects can be updated to discriminative features. To this end, we first develop a cascaded center prediction module built upon CenterNet to produce a set of high-quality proposals viewed as nodes of the graph. In addition, to maximize the value of global context information, we present a multi-granularity feature fusion network to encode the whole scene information which is also viewed as nodes of the graph. Then, the spatial and semantic relationships between objects are learned to initialize edges of the graph. Finally, context reasoning is conducted to update the node states iteratively. Extensive experiments are conducted on MS COCO and Pascal VOC to demonstrate the effectiveness of the proposed CRNet. Experimental results show that the proposed CRNet greatly improves the detection performance over existing context-based detectors, and it is comparable with state-of-the-art detectors
What problem does this paper attempt to address?