A New Lightweight Hybrid Graph Convolutional Neural Network -- CNN Scheme for Scene Classification using Object Detection Inference

Ayman Beghdadi,Azeddine Beghdadi,Mohib Ullah,Faouzi Alaya Cheikh,Malik Mallem
2024-07-20
Abstract:Scene understanding plays an important role in several high-level computer vision applications, such as autonomous vehicles, intelligent video surveillance, or robotics. However, too few solutions have been proposed for indoor/outdoor scene classification to ensure scene context adaptability for computer vision frameworks. We propose the first Lightweight Hybrid Graph Convolutional Neural Network (LH-GCNN)-CNN framework as an add-on to object detection models. The proposed approach uses the output of the CNN object detection model to predict the observed scene type by generating a coherent GCNN representing the semantic and geometric content of the observed scene. This new method, applied to natural scenes, achieves an efficiency of over 90\% for scene classification in a COCO-derived dataset containing a large number of different scenes, while requiring fewer parameters than traditional CNN methods. For the benefit of the scientific community, we will make the source code publicly available: <a class="link-external link-https" href="https://github.com/Aymanbegh/Hybrid-GCNN-CNN" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the adaptability and accuracy issues in **indoor/outdoor scene classification**. Specifically, the existing computer vision frameworks lack sufficient adaptability when dealing with different types of scenes (especially indoor and outdoor scenes). To solve this problem, the author proposes a new lightweight hybrid graph convolutional neural network (LH - GCNN) - CNN framework as an add - on to the object detection model. ### Main Problems and Solutions 1. **Limitations of Existing Methods**: - Most current visual SLAM (vSLAM) methods focus on specific types of environments (indoor or outdoor), and these methods usually rely on fixed assumptions, which limit their applicability in complex and dynamic environments. - Existing scene classification methods have deficiencies in dealing with natural scenes, especially in non - satellite image scene classification, and are unable to fully utilize the spatial distribution and semantic information of objects. 2. **Proposed Solutions**: - The author proposes a lightweight hybrid graph convolutional neural network (LH - GCNN) - CNN framework, which uses the output of the object detection model to predict the type of the observed scene. - By generating a coherent GCNN to represent the semantic and geometric content of the observed scene, the accuracy of scene classification is improved. - This method can achieve a classification accuracy of over 90% on the COCO - derived dataset while using far fewer parameters than traditional CNN methods. ### Key Innovation Points - **Lightweight Architecture**: The proposed LH - GCNN - CNN framework not only achieves high accuracy but also significantly reduces the number of parameters used, making it easier to deploy. - **Spatial - Semantic Graph Construction**: By combining the spatial location and semantic information of objects, a graph structure that can better describe scene features is constructed. - **Generality**: This framework can be integrated into any object detection/segmentation model, such as YOLACT, and can be extended to other similar models. ### Experimental Results The experimental results show that this method performs well on multiple metrics: - Under different numbers of object categories, the GINLAF model performs best, with an accuracy rate of 92.0%. - Compared with traditional CNN and ViT models, this method reduces the number of required parameters by 100 times and increases the inference speed by 66 times. ### Summary This paper proposes a new lightweight hybrid graph convolutional neural network (LH - GCNN) - CNN framework, which solves the adaptability and accuracy problems of existing methods in indoor/outdoor scene classification. By combining the output of the object detection model and the advantages of the graph convolutional neural network, this method has achieved remarkable results in natural scene classification tasks and has low computational cost and high flexibility.