Summarization of Corporate Risk Factor Disclosure Through Topic Modeling.

Yang Bao,Anindya Datta
2012-01-01
International Conference on Information Systems
Abstract:In this paper, we propose a novel problem of summarizing textual corporate risk factor disclosure, which aims to simultaneously infer the risk types across corpus and assign each risk factor to its most probable risk type. To solve the problem, we develop a variation of LDA topic model called Sent-LDA. The variational EM learning algorithm, which guarantees fast convergence, is derived and implemented for our model. Experiments show that our model is much more efficient and effective than LDA for solving our proposed problem. Specifically, our model is 50 times faster than LDA in the same conditions, and generates better topics for summarization than LDA. Our model is visualized in a publicly available system.
What problem does this paper attempt to address?