A Single-Stream Adaptive Scene Layout Modeling Method for Scene Recognition

Qun Wang,Feng Zhu,Zhiyuan Lin,Jianyu Wang,Xiang Li,Pengfei Zhao
DOI: https://doi.org/10.1007/s00521-024-09772-1
2024-01-01
Neural Computing and Applications
Abstract:Scene recognition has been the foundation of research in computer vision fields. Because scene images typically are composed of specific regions distributed in some layout, so modeling layouts of various scenes is a key clue for scene recognition. Existing methods usually require an additional stream to detect regions for subsequent modeling, which accumulate errors and may miss important information. Meanwhile, they use manual features to model relations between regions, which weakens the representation ability of layouts. In this paper, we propose a single-stream adaptive scene layout modeling approach based on a layout modeling module (LMM), which constructs layouts without additional detection streams and adaptively captures the relations to take advantage of graph attention network. LMM is directly concatenated to a convolutional neural network, where each pixel of the activation maps of the last convolutional layer is defined as a region that is the initial input node of the LMM. LMM first models the layout of each region, and then uses all regions with layout information to model the entire scene. Layout relations are encoded as edges, which are automatically analyzed according to region co-occurrence and relative position. Our work can be understood as optimizing features of the activation maps from a scene layout modeling perspective for scene recognition. Experimental results on MIT67, SUN397, and Places365 show that our single-stream model achieves competitive performance.
What problem does this paper attempt to address?