Financial Topic Modeling Based on the BERT-LDA Embedding

Mei Zhou,Ying Kong,Jianwu Lin
DOI: https://doi.org/10.1109/INDIN51773.2022.9976145
2022-01-01
Abstract:Topic modeling extracts useful potential topics that reflect market information from massive financial news and is widely used in data mining and economic research. Traditional topic modeling approaches such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) lack semantic information, and short texts have feature sparse problems. We develop a topic clustering model based on BERT-LDA joint embedding that takes both contextual semantics and thematic narrative into account. We cluster document embeddings with the HDBSCAN algorithm and utilize a class-based TF-IDF (c-TF-IDF) method to create topic representations. Empirical results show that the BERT-LDA model is competitive compared with traditional and single topic models. It generates coherent topic words that are dissimilar to each other.
What problem does this paper attempt to address?