Egocentric Intention Object Prediction Based on a Human-Like Manner

Zongnan Ma,Jingru Men,Fuchun Zhang,Zhixiong Nan
DOI: https://doi.org/10.1016/j.eij.2024.100482
IF: 4.195
2024-01-01
Egyptian Informatics Journal
Abstract:This paper deals with the problem of egocentric intention object prediction, which requires a model to produce a probability map for the possible locations of human intention objects, based on an egocentric image from daily activities. Existing methods typically rely on visible indications (e.g., visual attention feature and human hand feature) to predict intention objects, assuming that intention object selection follows a bottom-up approach. However, in human decision-making on intention objects, a top-down cognitive process also occurs invisibly, analyzing object candidates’ relevance to the ongoing activity (e.g., object function’s alignment with activity goals) and the overall scene (e.g., semantic context and object distances). Based on this idea, this paper introduces a multi-modal fusion mechanism that considers both visible bottom-up cues and invisible top-down cues for predicting intention objects in a human-like manner. Additionally, this study pioneers the use of a multi-depth supervision mechanism in human intention object prediction. Our method surpasses eight baseline approaches in experiments on two public datasets, as confirmed by ablation studies validating our mechanisms’ effectiveness.
What problem does this paper attempt to address?