Deformable Template Network (DTN) for Object Detection

Shuai Wu,Yong Xu,Bob Zhang,Jian Yang,David Zhang
DOI: https://doi.org/10.1109/tmm.2021.3075323
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:Objects often have different appearances because of viewpoint changes or part deformation. How to reasonably model these variations is still a big challenge for object detection. In this paper, we propose a novel Deformable Template Network (DTN), which exploits the pictorial structure to model possible variations of an object. DTN represents an object by virtue of a generated template in a deformable way. It has two key modules: the template generating module and the part matching module. The template generating module produces a template for a given object which defines the anchor positions of the $k{\times }k$ parts. Based on such a template, the part matching module aims to perform part alignment around the anchor positions. In terms of each part, the matching process makes a trade-off between maximizing the detection score and minimizing the deformation cost relative to the anchor position. Moreover, DTN is a fully convolutional network which means it is competitive in terms of detection efficiency. We evaluate DTN on both the PASCAL VOC and MSCOCO datasets, achieving the state-of-the-art results, an accuracy of 82.7% for PASCAL VOC and of 44.9% for MSCOCO.
What problem does this paper attempt to address?