Visual Attention Consistency for Human Attribute Recognition

Hao Guo,Xiaochuan Fan,Song Wang
DOI: https://doi.org/10.1007/s11263-022-01591-y
IF: 13.369
2022-03-05
International Journal of Computer Vision
Abstract:The recognition of a human attribute is usually determined by certain regions of the input image, e.g., certain part of the human body, and such attribute-region relevance plays an important role in human attribute recognition. In deep networks, this attribute-region relevance can be derived as an interpretive attention map, where highlighted areas indicate the most relevant regions that contribute to the final recognition. Based on the assumption that more plausible attention maps indicate better networks, in this paper, we propose a new approach for human attribute recognition by exploring and enforcing two kinds of attention consistency in network learning. One kind of consistency enforces the equivariance of the attention map when the input image undergoes certain spatial transforms, such as scaling, rotation and flipping. The other kind of the consistency is enforced between the attention maps derived from two different networks when both of them are trained for recognizing the same attribute from the same image. We formulate these two kinds of consistency as new loss functions and combine them with the traditional classification loss for network training. Experiments on three datasets of human attribute recognition verify the effectiveness of the proposed method by achieving new state-of-the-art performance.
computer science, artificial intelligence
What problem does this paper attempt to address?