Mining Regional Relation from Pixel-wise Annotation for Scene Parsing

Zichen Song,Hongliang Li,Heqian Qiu,Xiaoliang Zhang
DOI: https://doi.org/10.1109/VCIP56404.2022.10008859
2022-01-01
Abstract:Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.
What problem does this paper attempt to address?