Textual Dependency Embedding for Person Search by Language

Kai Niu,Yan Huang,Liang Wang
DOI: https://doi.org/10.1145/3394171.3413895
2020-01-01
Abstract:Person search by language aims to associate the pedestrian images with free-form natural language descriptions. Although great efforts have been made to align images with sentences, most researchers neglect the difficulty of long-distance dependency modeling in textual encoding, which is very important for solving this problem because the description sentences are always long and have complex structures for distinguishing different pedestrians. In this work, we focus on the long-distance dependencies in a sentence for better textual encoding, and accordingly propose the Textual Dependency Embedding (TDE) method. We first employ the sentence analysis tools to figure out the long-distance syntactic dependencies from a dependent to its governor in a sentence. Then we embed the dependent representations to their governor adaptively in our Governor-guided Dependent Attention Module (GDAM) to model these long-distance relations. After that, we further consider the dependency types, which also tell the importance of different dependents semantically, and embed them together with the dependents' features to clarify their inequivalent contributions to their governor. Extensive experiments and analysis on person search by language and image-text matching have validated the effectiveness of our method, and we have obtained the state-of-the-art performance on the CUHK-PEDES and Flickr30K datasets.
What problem does this paper attempt to address?