Gaussian Dilated Convolution for Semantic Image Segmentation.

Falong Shen,Gang Zeng
DOI: https://doi.org/10.1007/978-3-030-00776-8_30
2018-01-01
Abstract:In semantic image segmentation, multi scale contextual information is collected by probing the features with dilated large convolution filters or spatial pooling operations. Such enlargement of the receptive field promotes a more stable and global consistence segmentation prediction. Dilated convolution can be treated as the combination of a sampling process and a common convolution. For example, a 3 x 3 convolution with a large dilation rate picks 9 positions in a very large window. In this paper we propose a more rational way to sample features from a very large receptive field. Specifically Gaussian kernels are used to accumulate features in each position to produce a more stable representation. We also delve into the difference of up-sampling logits and down-sampling ground truth and provide a theoretical explanation. We demonstrate the effectiveness of Gaussian dilated convolution on the semantic image segmentation datasets of Pascal VOC 2012, Cityscapes and ADE20k. Gaussian dilated convolution performs consistently superior to dilated convolution throughout our experiments, which verifies the effectiveness of this method. Code will be released for reproduction.
What problem does this paper attempt to address?