Correlated industries mining for Chinese financial news based on LDA trained with research reports

Liwei Yan,Bo Bai
DOI: https://doi.org/10.1109/ISCIT.2016.7751607
2016-01-01
Abstract:Application of latent Dirichlet allocation (LDA) in text analysis has received much attention because it is capable of characterizing the hidden topics of the documents within the Bayesian framework. In this paper, we train the LDA model with financial research reports to predict the most correlated industries of the financial news among the 24 first-level industries of Chinese market. Since the topics of the tagged research reports are more concentrated than that of news, we calculate the optimal industry topic distributions with least loss of information to overcome the mismatch. Then the Jensen-Shannon divergence is introduced for mining the correlated industries. The promising performance of this method provides a solid foundation for financial information retrieval and the event-driven research.
What problem does this paper attempt to address?