Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns

Ruihai Wu,Yourong Zhang,Yu Qi,Andy Guanhong Chen,Hao Dong
DOI: https://doi.org/10.1145/3652583.3658010
2024-01-01
Abstract:With the development of Embodied AI, Robotics and Augmented Reality, videos captured from the 'first-person' point of view, also known as egocentric videos, are arousing interests in Computer Vision and Robotics communities. Further, learning a proper representation of egocentric videos can benefit diverse downstream tasks like action forecasting and human object interactions, further beneficial for robotic planning. However, current works mostly focus on learning the temporal or topological information for egocentric video representations, while the activity patterns, which reveal the behavior regularities or the intentions of people or robots in a more explicit way, are not carefully considered. In this paper, we propose a novel framework, Pattern4Ego, that learns the representations of egocentric videos using cross-video activity patterns. This framework achieves state-of-the-art performance on two representative egocentric video tasks: long-term action anticipation and context-based environment affordance.
What problem does this paper attempt to address?