Adaptive Bilinear Pooling for Fine-grained Representation Learning.
Shaobo Min,Hongtao Xie,Youliang Tian,Hantao Yao,Yongdong Zhang
DOI: https://doi.org/10.1145/3338533.3366567
2019-01-01
Abstract:Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?