Boosting Scene Parsing Performance via Reliable Scale Prediction.

Hengcan Shi,Hongliang Li,Qingbo Wu,Fanman Meng,King N. Ngan
DOI: https://doi.org/10.1145/3240508.3240657
2018-01-01
Abstract:Segmenting objects on suitable scales is a key factor to improve the scene parsing performance. Existing methods either simply average multi-scale results or predict scales by weakly-supervised models, due to the lack of scale labels. In this paper, we propose a novel fully-supervised Scale Prediction Model. On one hand, the proposed Scale Prediction Model learns parsing scales by the strong scale supervision, which is automatically generated from the scene parsing ground truth without any extra manually annotation. On the other hand, we explore the relationship between scale and object class, and propose to use the object class information to further improve the reliability of the scale prediction. The proposed Scale Prediction Model improves 23.1%, 20.1% and 29.3% scale prediction accuracies on the NYU Depth v2, PASCAL-Context and SIFT Flow datasets, respectively. Based on the Scale Prediction Model, we design a Scale Parsing Net (SPNet) for scene parsing, which segments each object on the scale predicted by the Scale Prediction Model. Moreover, SPNet leverages the intermediate result (i.e., the object class) to refine the parsing results. The experiment results show that SPNet outperforms many state-of-the-art methods on multiple scene parsing datasets.
What problem does this paper attempt to address?