Hybrid Deep Neural Network for Visual Phrase Detection

Lin Bai,Lina Yang,Yuanyan Tang,Lin Huo,Taoshen Li
2018-01-01
Abstract:Detecting visual phrase in cluttered scenes accurately are challenging problems in computer vision. In this paper, we introduce a hybrid deep learning model to detect and recognize the visual phrase involved in an image. A key contribution of our work is modeling the object-level spatial arrangements of images to aid the learning of high-level relational visual features by using the proposed Factored Conditional Restricted Boltzmann Machine (FCRBM). In this work, we use the deep convolution neural network to learn the object-level representation of an image, which can precisely describe the scene and objects of the query image. These object-level features are further fed into the FCRBM to learn the high-level relational features between objects and scenes. Instead of the traditional deep learning model without or with bias-based condition, the three-way multiplicative interaction structure of FCRBM makes sure that the spatial context can precisely facilitate the learning of the relational features. The top classification RBM achieve the mapping from the relational features to the visual phrase structure label. Compared with the state-of-the-arts, our model achieves competitive visual phrase detection on two known datatsets.
What problem does this paper attempt to address?