Data-driven hierarchical structure kernel for multiscale part-based object recognition

Botao Wang,Hongkai Xiong,Xiaoqian Jiang,Yuan F Zheng
DOI: https://doi.org/10.1109/TIP.2014.2307480
Abstract:Detecting generic object categories in images and videos are a fundamental issue in computer vision. However, it faces the challenges from inter and intraclass diversity, as well as distortions caused by viewpoints, poses, deformations, and so on. To solve object variations, this paper constructs a structure kernel and proposes a multiscale part-based model incorporating the discriminative power of kernels. The structure kernel would measure the resemblance of part-based objects in three aspects: 1) the global similarity term to measure the resemblance of the global visual appearance of relevant objects; 2) the part similarity term to measure the resemblance of the visual appearance of distinctive parts; and 3) the spatial similarity term to measure the resemblance of the spatial layout of parts. In essence, the deformation of parts in the structure kernel is penalized in a multiscale space with respect to horizontal displacement, vertical displacement, and scale difference. Part similarities are combined with different weights, which are optimized efficiently to maximize the intraclass similarities and minimize the interclass similarities by the normalized stochastic gradient ascent algorithm. In addition, the parameters of the structure kernel are learned during the training process with regard to the distribution of the data in a more discriminative way. With flexible part sizes on scale and displacement, it can be more robust to the intraclass variations, poses, and viewpoints. Theoretical analysis and experimental evaluations demonstrate that the proposed multiscale part-based representation model with structure kernel exhibits accurate and robust performance, and outperforms state-of-the-art object classification approaches.
What problem does this paper attempt to address?