Semi-Open Set Object Detection Algorithm Leveraged by Multi-Modal Large Language Models

Kewei Wu,Yiran Wang,Xiaogang He,Jinyu Yan,Yang Guo,Zhuqing Jiang,Xing Zhang,Wei Wang,Yongping Xiong,Aidong Men,Li Xiao
DOI: https://doi.org/10.3390/bdcc8120175
2024-01-01
Big Data and Cognitive Computing
Abstract:Currently, closed-set object detection models represented by YOLO are widely deployed in the industrial field. However, such closed-set models lack sufficient tuning ability for easily confused objects in complex detection scenarios. Open-set object detection models such as GroundingDINO expand the detection range to a certain extent, but they still have a gap in detection accuracy compared with closed-set detection models and cannot meet the requirements for high-precision detection in practical applications. In addition, existing detection technologies are also insufficient in interpretability, making it difficult to clearly show users the basis and process of judgment of detection results, causing users to have doubts about the trust and application of detection results. Based on the above deficiencies, we propose a new object detection algorithm based on multi-modal large language models that significantly improves the detection effect of closed-set object detection models for more difficult boundary tasks while ensuring detection accuracy, thereby achieving a semi-open set object detection algorithm. It has significant improvements in accuracy and interpretability under the verification of seven common traffic and safety production scenarios.
What problem does this paper attempt to address?