The Roles of Contextual Semantic Relevance Metrics in Human Visual Processing

Kun Sun,Rong Wang
2024-10-14
Abstract:Semantic relevance metrics can capture both the inherent semantics of individual objects and their relationships to other elements within a visual scene. Numerous previous research has demonstrated that these metrics can influence human visual processing. However, these studies often did not fully account for contextual information or employ the recent deep learning models for more accurate computation. This study investigates human visual perception and processing by introducing the metrics of contextual semantic relevance. We evaluate semantic relationships between target objects and their surroundings from both vision-based and language-based perspectives. Testing a large eye-movement dataset from visual comprehension, we employ state-of-the-art deep learning techniques to compute these metrics and analyze their impacts on fixation measures on human visual processing through advanced statistical models. These metrics could also simulate top-down and bottom-up processing in visual perception. This study further integrates vision-based and language-based metrics into a novel combined metric, addressing a critical gap in previous research that often treated visual and semantic similarities separately. Results indicate that all metrics could precisely predict fixation measures in visual perception and processing, but with distinct roles in prediction. The combined metric outperforms other metrics, supporting theories that emphasize the interaction between semantic and visual information in shaping visual perception/processing. This finding aligns with growing recognition of the importance of multi-modal information processing in human cognition. These insights enhance our understanding of cognitive mechanisms underlying visual processing and have implications for developing more accurate computational models in fields such as cognitive science and human-computer interaction.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore the role of **context - semantic relevance measures** in human visual processing, especially how these measures affect human visual perception and attention allocation. Specifically, the researchers attempt to solve the following key problems: 1. **Limitations of existing research**: - Although many previous studies have demonstrated the impact of semantic relevance measures on human visual processing, they often do not fully consider **contextual information**. - Existing methods usually do not use the latest deep - learning models to calculate these measures more accurately. 2. **Multimodal information integration**: - Visual and linguistic information are usually processed separately, and this study attempts to combine the two and develop a new comprehensive measurement method to more comprehensively understand the semantic and visual information interaction in visual processing. 3. **Predicting and explaining eye - movement data**: - Researchers use large - scale eye - movement data sets, calculate semantic relevance measures through state - of - the - art deep - learning techniques, and analyze the impact of these measures on eye - movement indicators (such as fixation duration and number of fixations), so as to better predict and explain the human visual processing process. 4. **Understanding of cognitive mechanisms**: - Through advanced statistical models (such as generalized additive mixed - effects models), researchers hope to reveal the specific role of these measures in human visual processing, thereby enhancing our understanding of the cognitive mechanisms behind visual processing. ### Specific objectives - **Introduce new measurement methods**: Combine visual and linguistic information to develop a new context - semantic relevance measurement method. - **Evaluate the effectiveness of the measurement**: Test the effectiveness and predictive ability of these new measurements through eye - movement data. - **Explore the interaction between visual and semantic information**: Study the interaction between visual and semantic information in human visual processing, especially how they jointly affect attention allocation. - **Improve the computational model**: Provide more accurate computational models for fields such as cognitive science and human - computer interaction to simulate human visual processing mechanisms. ### Summary The main purpose of this paper is to more comprehensively understand and predict attention allocation and cognitive mechanisms in human visual processing by introducing new context - semantic relevance measurement methods and combining visual and linguistic information. This not only helps to fill the gaps in existing research, but also provides theoretical support for the development of more accurate computational models.