A Method For Content-Based News Story Classification In Data Mining

Z Lei,Ld Wu,Sy Lao,C Wang
2004-01-01
Abstract:Multimedia data mining is a sub field of data mining that deals with the mining of high-level multimedia information and implicit knowledge from large multimedia databases, and the classification is one of data mining modules for mining knowledge in multimedia databases. In this paper, a new method is presented to detect anchorperson shots automatically for digital TV news programs, then we use video OCR technique to extract text from news video stream, finally, Transductive Support Vector Machine (TSVM) is used to perform automated classification of news stories based on the texts obtained from OCR process for the first time. TSVM takes into account a particular test set and try to minimize misclassifications of just those particular examples. Experimental results show that TSVM is better than other learning algorithms such as Decision Trees and SVM, especially for small training sets.
What problem does this paper attempt to address?