A Novel Topic Model for Automatic Term Extraction

Sujian Li,Jiwei Li,Tao Song,Wenjie Li,Baobao Chang
DOI: https://doi.org/10.1145/2484028.2484106
2013-01-01
Abstract:Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.
What problem does this paper attempt to address?