Special Issue on Multimedia Event Detection

Thomas B. Moeslund,Omar Javed,Yu-Gang Jiang,R. Manmatha
DOI: https://doi.org/10.1007/s00138-013-0586-x
IF: 2.983
2013-01-01
Machine Vision and Applications
Abstract:Recently, the generation, storage and sharing of multimedia video data has increased at an astronomical rate. In 2012, over 100 h of videos was uploaded on YouTube every minute1. The multimedia data being shared covers a wide variety, ranging from homemade birthday videos to professionally produced comedy skits, and from woodworking tutorials to breaking news reports and analysis, etc. Although the storage and dissemination capacity of the network has grown exponentially, the development of automatic tools to search and retrieve this data has not kept pace, and by large, manual annotation and categorization is used for video search. Manual annotation, along with being expensive and slow, cannot express the rich content of video data. A layperson or an analyst might want to search the video not only based on the main topic (e.g., news report of a protest) but also based on the events taking place, the activities of the people and entities being viewed, the conversations taking place and the sounds recorded. To automatically detect, classify and index everything that can
What problem does this paper attempt to address?