Video Query: Research Directions 2. Stages of Video Query Table 2 Candidate List (simplified) Resulting from Query
R. Bole,B. Yeo,M. Yeung,R. Bolle,L. Yeo,Ibm J Res
Abstract:As digital video databases become more and more pervasive, finding video in large databases becomes a major problem. Because of the nature of video (streamed objects), accessing the content of such databases is inherently a time-consuming operation. Enabling intelligent means of video retrieval and rapid video viewing through the processing, analysis, and interpretation of visual content are, therefore, important topics of research. In this paper, we survey the art of video query and retrieval and propose a framework for video-query formulation and video retrieval based on an iterated sequence of navigating, searching, browsing, and viewing. We describe how the rich information media of video in the forms of image, audio, and text can be appropriately used in each stage of the search process to retrieve relevant segments. Also, we address the problem of automatic video annotation-attaching meanings to video segments to aid the query steps. Subsequently, we present a novel framework of structural video analysis that focuses on the processing of high-level features as well as low-level visual cues. This processing augments the semantic interpretation of a wide variety of long video segments and assists in the search, navigation, and retrieval of video. We describe several such techniques. More and more video is generated every day. Today, much of this data is produced and stored in some analog form such as VHS video or motion pictures. But the trend is toward total digitization of film and video, and with the arrival of cheaper digital-storage devices, it becomes economically feasible to digitize video data and store and transmit it in some sort of digital form. Eventually, all storage and transport mechanisms to television receivers and computer displays will be dominated by digital technologies [l]. These technologies include CD-ROM, video tape recorders, telecommunication networks, cable, and terrestrial and satellite transmission. The digital form allows processing of the video data to generate appropriate data abstractions that permit flexible video-database organization and enable content-based retrieval of video. That is, very much as today's large text databases can be searched with text queries, video databases will be able to be searched with combined text and visual queries. Video clips, possibly very short, will be retrieved from longer sequences in large databases on the basis of some sort of organization of the time-oriented structure of the video, and, more interestingly, on the basis of the semantic video content. For the latter case, an example of video …