Unsupervised two-stage keyword extraction from spoken documents by topic coherence and support vector machine

Yun-Nung Chen,Yu Huang,Hung-yi Lee,Lin-Shan Lee
DOI: https://doi.org/10.1109/ICASSP.2012.6289053
2012-01-01
ICASSP
Abstract:This paper proposes an unsupervised two-stage approach to automatically extract keywords from spoken documents. In the first stage, for each candidate term we compute a topic coherence and term significance measure (TCS) based on probabilistic latent semantic analysis (PLSA) models. In the second stage, we take the candidate terms with highest and lowest TCS scores as positive and negative examples to train an SVM classifier in an unsupervised way using prosodic, lexical, and semantic features, and then classify the candidate keyword using this SVM classifier. The experiments with course lectures showed that the first-stage offered very good precision, so the second-stage effectively extracted the keywords.
What problem does this paper attempt to address?