PiCANet: Learning Pixel-wise Contextual Attention in ConvNets and Its Application in Saliency Detection.

Nian Liu,Junwei Han
2017-01-01
Abstract:1Context plays an important role in many computer vision tasks. Previous models usually construct contextual information from the whole context region. However, not all context locations are helpful and some of them may be detrimental to the final task. To solve this problem, we propose a novel pixel-wise contextual attention network, i.e., the PiCANet, to learn to selectively attend to informative context locations for each pixel. Specifically, it can generate an attention map over the context region for each pixel, where each attention weight corresponds to the contextual relevance of each context location w.r.t. the specified pixel location. Thus, an attended contextual feature can be constructed by using the attention map to aggregate the contextual features. We formulate PiCANet in a global form and a local form to attend to global contexts and local contexts, respectively. Our designs for the two forms are both fully differentiable. Thus they can be embedded into any CNN architectures for various computer vision tasks in an end-to-end manner. We take saliency detection as an example application to demonstrate the effectiveness of the proposed PiCANets. Specifically, we embed global and local PiCANets into an encoder-decoder Convnet hierarchically. Thorough * This paper was previously submitted to CVPR 2017 and ICCV 2017. This is a slightly revised version based on our previous submission. analyses indicate that the global PiCANet helps to construct global contrast while the local PiCANets help to enhance the feature maps to be more homogenous, thus making saliency detection results more accurate and uniform. As a result, our proposed saliency model achieves state-of-the-art results on 4 benchmark datasets.
What problem does this paper attempt to address?