Text Segmentation with LDA-based Fisher Kernel

Qi Sun,Runxin Li,Dingsheng Luo,Xihong Wu
DOI: https://doi.org/10.3115/1557690.1557768
2008-01-01
Abstract:In this paper we propose a domain-independent text segmentation method, which consists of three components. Latent Dirichlet allocation (LDA) is employed to compute words semantic distribution, and we measure semantic similarity by the Fisher kernel. Finally global best segmentation is achieved by dynamic programming. Experiments on Chinese data sets with the technique show it can be effective. Introducing latent semantic information, our algorithm is robust on irregular-sized segments.
What problem does this paper attempt to address?