Tv Commercial Classification By Using Multi-Modal Textual Information

Yantao Zheng,Lingyu Duan,Qi Tian,Jesse S. Jin
DOI: https://doi.org/10.1109/ICME.2006.262434
2006-01-01
Abstract:In this paper, we propose an approach for TV commercial video classification by the categories of advertised products or services (e.g. automobiles, healthcare products, etc). Since automatic speech recognition (ASR) and optical character recognition (OCR) can deliver meaningful textual information related to products or services, TV commercial video classification is formulated as the problem of text categorization. However, there exist two challenges. Firstly, the background music of TV commercials makes ASR techniques yield erroneous and deficient output transcripts. Secondly, even if ASR and OCR could work perfectly, the limited textual information from TV commercials do not suffice to train a generic and non-overfitting text categorizer. For the first issue, our approach resorts to the external resources to expand deficient ASR and OCR transcripts. The output transcripts of ASR and OCR are parsed to yield a few keywords, on which a Web searching is executed to retrieve relevant and semantically informative articles from World Wide Web (WWW). The retrieved articles are then utilized to construct textual feature vectors and perform text categorization on behalf of commercials. For the second issue, a topic-wise document corpus is constructed from the public corpora like Reuters-21578 or from the articles manually collected from YAM for the training of text categorizers. Experimental results have shown that the proposed approach alleviates the negative effects from weak ASR/OCR performance and yield a promising classification accuracy of 80.9%.
What problem does this paper attempt to address?