Automatic Genre and Show Identification of Broadcast Media

Mortaza Doulaty,Oscar Saz,Raymond W. M. Ng,Thomas Hain
DOI: https://doi.org/10.48550/arXiv.1606.03333
2016-06-10
Abstract:Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives. Effective techniques are needed to make such data accessible further. Automatic meta-data labelling of broadcast media is an essential task for multimedia indexing, where it is standard to use multi-modal input for such purposes. This paper describes a novel method for automatic detection of media genre and show identities using acoustic features, textual features or a combination thereof. Furthermore the inclusion of available meta-data, such as time of broadcast, is shown to lead to very high performance. Latent Dirichlet Allocation is used to model both acoustics and text, yielding fixed dimensional representations of media recordings that can then be used in Support Vector Machines based classification. Experiments are conducted on more than 1200 hours of TV broadcasts from the British Broadcasting Corporation (BBC), where the task is to categorise the broadcasts into 8 genres or 133 show identities. On a 200-hour test set, accuracies of 98.6% and 85.7% were achieved for genre and show identification respectively, using a combination of acoustic and textual features with meta-data.
Multimedia,Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the automatic metadata annotation of broadcast media, specifically focusing on the automatic identification of media genre and shows. With the increasing amount of digital video produced and broadcast every day, huge media archives have been formed, and effective techniques are required to improve the accessibility of these data. Automatic metadata annotation is crucial for multimedia indexing, and multi - modal inputs (such as audio features, text features and their combinations) play an important role in this process. The paper presents a novel method that uses acoustic features, text features or a combination of both, as well as available metadata (such as broadcast time), to achieve automatic detection of media genre and identification of show identities. By using Latent Dirichlet Allocation (LDA) to model the acoustic and text, a fixed - dimensional representation of media records is generated, and then classification is carried out based on Support Vector Machines (SVM). In the experiment, more than 1,200 hours of BBC (British Broadcasting Corporation) television broadcast data were used, with the goal of classifying the broadcast content into 8 types or 133 show identities. In a 200 - hour test set, by combining acoustic and text features with metadata, the accuracy rates of type identification and show identity identification reached 98.6% and 85.7% respectively. In short, the paper aims to develop an efficient and accurate automatic system for type and show identity identification of large - scale broadcast media, in order to improve the management and retrieval efficiency of media content.