Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach

Salah Eddine Laidoudi,Madjid Maidi,Samir Otmane
2024-09-03
Abstract:Real-time object detection in indoor settings is a challenging area of computer vision, faced with unique obstacles such as variable lighting and complex backgrounds. This field holds significant potential to revolutionize applications like augmented and mixed realities by enabling more seamless interactions between digital content and the physical world. However, the scarcity of research specifically fitted to the intricacies of indoor environments has highlighted a clear gap in the literature. To address this, our study delves into the evaluation of existing datasets and computational models, leading to the creation of a refined dataset. This new dataset is derived from OpenImages v7, focusing exclusively on 32 indoor categories selected for their relevance to real-world applications. Alongside this, we present an adaptation of a CNN detection model, incorporating an attention mechanism to enhance the model's ability to discern and prioritize critical features within cluttered indoor scenes. Our findings demonstrate that this approach is not just competitive with existing state-of-the-art models in accuracy and speed but also opens new avenues for research and application in the field of real-time indoor object detection.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve real - time object detection in indoor environments. Specifically, the paper focuses on the following aspects: 1. **Coping with the unique challenges of indoor environments**: Indoor environments have problems such as changing lighting conditions, complex backgrounds, and object occlusions, all of which pose additional challenges to object detection. Existing object detection models perform poorly when dealing with these specific problems, so a new method is needed to optimize the performance of indoor object detection. 2. **Improving real - time performance and accuracy**: In applications such as augmented reality (AR) and mixed reality (MR), real - time object detection requires not only high precision but also low latency. This means that the detection system must achieve fast processing while maintaining high accuracy. 3. **Developing a dataset adapted to indoor environments**: Existing datasets such as COCO are comprehensive but not entirely suitable for the challenges of indoor environments. Therefore, the paper proposes to create a new dataset specifically for indoor object detection to better reflect the complexity in practical application scenarios. 4. **Designing an efficient model architecture**: In order to meet the requirements of accuracy and real - time performance simultaneously, the paper proposes a hybrid architecture that combines convolutional neural networks (CNN) and transformers. This architecture aims to take advantage of the strength of CNN in local feature extraction and the ability of transformers in global information integration, thereby providing a more powerful solution. By solving the above problems, the paper aims to provide a new and efficient method for real - time object detection in indoor environments and promote the development of computer vision technology.