Multi-source Topic Detection Analysis Based on Improved ccLDA Model

Xingshu CHEN,Chenxi MA,Wenxian WANG,Yue GAO,Haizhou WANG
DOI: https://doi.org/10.15961/j.jsuese.201700626
2018-01-01
Abstract:At present,ccLDA (cross collection LDA) model has been found only applicable to data sources that topic similarity is very high,and its global topics and local topics of each data source will be forced alignment,hence causing words sparse.In order to solve the problem of ccLDA model,an improved ccLDA topic model (IccLDA) was proposed.When sampling,this model firstly decides whether words are global topics or loc-al topics,and then takes samples respectively.In this way,it can avoid the problem that the global topics and local topics in ccLDA model must be aligned,and also can reduce the dispersion degree of the words in the global topics and local topics,making the model suitable for multiple data source scenarios.The topic discovery experiments of multiple data source were conducted on public data sets,and a comparative analysis of topics was conducted.The experimental results showed that the confusion degree of IccLDA model is lower than LDA model and ccLDA model,indicat-ing that IccLDA model has better modeling ability.Finally,further experimental verification was performed with the data sets of real-world scen-arios.The result showed that the improved model not only has better modeling ability than the traditional models,but also can effectively discover public topics discussed by various data sources and local topics discussed by each data source,and is more suitable for topic discovery in multiple data source scenarios.
What problem does this paper attempt to address?