Abstract:As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA$\rho$ up to 0.459 on Pascal VOC) compared to depth prediction (mA$\rho$ up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to explore the correlations between depth prediction and visual saliency prediction and object detection performance. Specifically, the author hopes to experimentally analyze how these tasks affect the accuracy of object detection through experiments and explore how these correlations change across different object categories and scales. The main research questions include: 1. **Correlations between depth prediction, visual saliency prediction and object detection performance**: - The author uses state - of - the - art models (such as DeepGaze IIE, Depth Anything, DPT - Large and Itti models) to conduct experiments on the COCO and Pascal VOC datasets to evaluate the correlations between depth prediction, visual saliency prediction and object detection accuracy. - The experimental results show that the correlation between visual saliency and object detection accuracy is stronger (for example, on the Pascal VOC dataset, the maximum mean Pearson correlation coefficient (mAρ) of visual saliency can reach 0.459), while the correlation of depth prediction is weaker (the maximum is 0.283). 2. **Changes in correlations under different object categories and scales**: - The study finds that the correlation values of larger objects can be more than three times higher than those of small objects, indicating that the size of the object has a significant impact on the model performance. - These findings provide valuable insights for feature engineering of specific object categories and dataset design, which helps to optimize the efficiency and accuracy of object detection systems. ### Main contributions - **Theoretical significance**: By quantifying the correlations between depth prediction, visual saliency prediction and object detection performance, it provides a theoretical basis for the development of multi - task learning frameworks. - **Practical applications**: The research results provide empirical evidence for improving object detection architectures and computational efficiency, especially for object detection tasks in complex scenes. - **Dataset design**: Revealing the changes in correlations under different object categories and scales provides guidance for dataset design, which helps to improve the accuracy and robustness of the model in practical applications. ### Conclusions In summary, this paper reveals the importance of visual saliency in object detection through systematic experiments and analysis and provides valuable references for future research and applications. These findings not only enhance the understanding of the relationships between computer vision tasks but also provide practical suggestions for developing more efficient and accurate object detection systems.

Correlation of Object Detection Performance with Visual Saliency and Depth Estimation

A Deep Model of Visual Attention for Saliency Detection on 3D Objects

Saliency Detection Via the Improved Hierarchical Principal Component Analysis Method

Depth incorporating with color improves salient object detection

Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection

Pyramidal Attention for Saliency Detection

MonoCD: Monocular 3D Object Detection with Complementary Depths

DeepSaliency : MultiTask Deep Neural Network Model for Salient Object Detection

RGB-D salient object detection: A survey

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

Salient Object Detection Via Multiple Saliency Weights

How good are detection proposals, really?

Spatial Coherence Loss: All Objects Matter in Salient and Camouflaged Object Detection

Depth-aware salient object detection using anisotropic center-surround difference

Improved YOLOv8 Model for a Comprehensive Approach to Object Detection and Distance Estimation

Visual search and real-image similarity: An empirical assessment through the lens of deep learning

Saliency Prediction in the Deep Learning Era: Successes and Limitations

Detection of Co-salient Objects by Looking Deep and Wide

Learning visual saliency based on object's relative relationship

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks