Deep-Structured Event Modeling for User-Generated Photos

Xiaoshan Yang,Tianzhu Zhang,Changsheng Xu
DOI: https://doi.org/10.1109/tmm.2017.2788210
IF: 7.3
2017-01-01
IEEE Transactions on Multimedia
Abstract:Vision-based event analysis is difficult because of the following challenges. The first challenge is intraclass variation. Photos uploaded by users are sparsely sampled visual appearances of an event over time. Thus, each photo may only capture a single object or scene of a specific complex event. The second challenge is interclass confusion. Photos related to different events may contain similar objects or scenes. Third, unusual events are characterized by scarcity, and only a few samples are available for use in learning event patterns. In this paper, by considering the photo timestamp, we propose a structured event modeling (SEM) framework for event analysis that exploits the temporal information of visual features and event classes in a photo sequence. Specifically, the temporal event patterns of the photo sequence and the relationships of different photos are jointly learned using deep neural networks (convolutional neural networks and recurrent neural networks) and a conditional random field. We evaluate the proposed SEM framework in two applications: multiclass event recognition and unusual event detection in photo sequences. The results of extensive experiments performed on a public event recognition dataset and a collected unusual event dataset demonstrate the effectiveness of the proposed method.
What problem does this paper attempt to address?