Correlation of Object Detection Performance with Visual Saliency and Depth Estimation

Matthias Bartolo,Dylan Seychell
2024-11-05
Abstract:As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA$\rho$ up to 0.459 on Pascal VOC) compared to depth prediction (mA$\rho$ up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to explore the correlations between depth prediction and visual saliency prediction and object detection performance. Specifically, the author hopes to experimentally analyze how these tasks affect the accuracy of object detection through experiments and explore how these correlations change across different object categories and scales. The main research questions include: 1. **Correlations between depth prediction, visual saliency prediction and object detection performance**: - The author uses state - of - the - art models (such as DeepGaze IIE, Depth Anything, DPT - Large and Itti models) to conduct experiments on the COCO and Pascal VOC datasets to evaluate the correlations between depth prediction, visual saliency prediction and object detection accuracy. - The experimental results show that the correlation between visual saliency and object detection accuracy is stronger (for example, on the Pascal VOC dataset, the maximum mean Pearson correlation coefficient (mAρ) of visual saliency can reach 0.459), while the correlation of depth prediction is weaker (the maximum is 0.283). 2. **Changes in correlations under different object categories and scales**: - The study finds that the correlation values of larger objects can be more than three times higher than those of small objects, indicating that the size of the object has a significant impact on the model performance. - These findings provide valuable insights for feature engineering of specific object categories and dataset design, which helps to optimize the efficiency and accuracy of object detection systems. ### Main contributions - **Theoretical significance**: By quantifying the correlations between depth prediction, visual saliency prediction and object detection performance, it provides a theoretical basis for the development of multi - task learning frameworks. - **Practical applications**: The research results provide empirical evidence for improving object detection architectures and computational efficiency, especially for object detection tasks in complex scenes. - **Dataset design**: Revealing the changes in correlations under different object categories and scales provides guidance for dataset design, which helps to improve the accuracy and robustness of the model in practical applications. ### Conclusions In summary, this paper reveals the importance of visual saliency in object detection through systematic experiments and analysis and provides valuable references for future research and applications. These findings not only enhance the understanding of the relationships between computer vision tasks but also provide practical suggestions for developing more efficient and accurate object detection systems.