The Open World of Micro-Videos

Phuc Xuan Nguyen,Gregory Rogez,Charless Fowlkes,Deva Ramanan
DOI: https://doi.org/10.48550/arXiv.1603.09439
2016-04-01
Abstract:Micro-videos are six-second videos popular on social media networks with several unique properties. Firstly, because of the authoring process, they contain significantly more diversity and narrative structure than existing collections of video "snippets". Secondly, because they are often captured by hand-held mobile cameras, they contain specialized viewpoints including third-person, egocentric, and self-facing views seldom seen in traditional produced video. Thirdly, due to to their continuous production and publication on social networks, aggregate micro-video content contains interesting open-world dynamics that reflects the temporal evolution of tag topics. These aspects make micro-videos an appealing well of visual data for developing large-scale models for video understanding. We analyze a novel dataset of micro-videos labeled with 58 thousand tags. To analyze this data, we introduce viewpoint-specific and temporally-evolving models for video understanding, defined over state-of-the-art motion and deep visual features. We conclude that our dataset opens up new research opportunities for large-scale video analysis, novel viewpoints, and open-world dynamics.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Large - scale video analysis**: The paper proposes a new large - scale video dataset, which contains 260,000 micro - videos. These videos are tagged with 58,000 labels. These micro - videos have unique properties, such as high diversity, rich narrative structures, special perspectives (for example, third - person, first - person and selfie perspectives), and open - world dynamics that evolve over time. These problems make existing computer vision methods and benchmark tests difficult to handle this type of data. 2. **Perspective modeling**: The paper emphasizes the importance of different perspectives in micro - videos, including third - person, first - person and selfie perspectives. Compared with traditional video datasets, the micro - video dataset contains a wider range of perspectives and richer narrative content. The paper analyzes these differences by introducing perspective - specific models, which can better capture the changes in video content under different perspectives. 3. **Open - world dynamics**: Micro - videos are usually accompanied by tags (such as #tags), which are not only used for searching, but also play an important role in social communication. These tags form an open vocabulary, and their usage frequency and semantics change dynamically over time. The paper explores how to learn in an open - world, especially when the label distribution follows long - tail statistics and changes over time. This provides new opportunities for exploring lifelong learning. 4. **Video understanding**: The paper also explores how to use state - of - the - art motion features and deep visual features to build video understanding models. These models can handle the complex dynamics in micro - videos, including perspective changes, time evolution, etc. Through experimental verification, the paper shows the effectiveness and potential of these models in large - scale video analysis. In conclusion, this paper aims to solve the deficiencies of existing computer vision methods in handling diversity and dynamic changes by analyzing large - scale micro - video datasets, thereby promoting the development of video understanding technology.