Comparison of Topic Modelling Approaches in the Banking Context

Bayode Ogunleye,Tonderai Maswera,Laurence Hirsch,Jotham Gaudoin,Teresa Brunsdon

DOI: https://doi.org/10.3390/app13020797

2024-02-06

Abstract:Topic modelling is a prominent task for automatic topic extraction in many applications such as sentiment analysis and recommendation systems. The approach is vital for service industries to monitor their customer discussions. The use of traditional approaches such as Latent Dirichlet Allocation (LDA) for topic discovery has shown great performances, however, they are not consistent in their results as these approaches suffer from data sparseness and inability to model the word order in a document. Thus, this study presents the use of Kernel Principal Component Analysis (KernelPCA) and K-means Clustering in the BERTopic architecture. We have prepared a new dataset using tweets from customers of Nigerian banks and we use this to compare the topic modelling approaches. Our findings showed KernelPCA and K-means in the BERTopic architecture-produced coherent topics with a coherence score of 0.8463.

Information Retrieval,Artificial Intelligence,Machine Learning,Computation

What problem does this paper attempt to address?

This paper mainly discusses the comparison of different topic modeling methods in the banking environment. In the study, the authors proposed using Kernel Principal Component Analysis (KernelPCA) and K-means clustering combined with the BERTopic architecture to extract topics, addressing the issues of data sparsity and inability to consider word order in traditional methods like Latent Dirichlet Allocation (LDA). They created a new dataset containing tweets from Nigerian bank customers and used this data to compare different topic modeling methods. The results showed that the topics generated by KernelPCA and K-means clustering under the BERTopic architecture had high coherence, with a coherence score of 0.8463. Although traditional methods like LDA have performed well in the past, their results were inconsistent. Therefore, this study aimed to experimentally compare and validate the latest topic modeling models and apply these techniques in the context of the Nigerian banking industry. The paper also reviews the history of topic modeling, from Latent Semantic Indexing (LSI) to transformer-based language models like BERT, and highlights the advantages and limitations of each method, particularly in handling data sparsity issues in social media texts and short texts. Finally, the paper introduces the experimental methods, including data preprocessing, algorithms used, and evaluation metrics such as coherence score, and demonstrates the performance of different models.

Comparison of Topic Modelling Approaches in the Banking Context

Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings

Exploring the Power of Topic Modeling Techniques in Analyzing Customer Reviews: A Comparative Analysis

A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts

Deep Learning based Topic Analysis on Financial Emerging Event Tweets

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

An Iterative Approach to Topic Modelling

COMPARATIVE ANALYSIS OF THEMATIC MODELING METHODS FOR ANALYSIS OF REVIEWS IN THE ONLINE STORE OF DIGITAL GOODS

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps.

A novel model for analyzing online customer experience in hotel services approach by topic modeling

Financial Topic Modeling Based on the BERT-LDA Embedding

Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints

The utility of topic modelling for discourse studies: A critical evaluation

Exploring climate change discourse on social media and blogs using a topic modeling analysis

An integrated clustering and BERT framework for improved topic modeling

Unveiling the Potential of BERTopic for Multilingual Fake News Analysis -- Use Case: Covid-19

Short Text Topic Modeling: Application to tweets about Bitcoin

Investigation of Topic Modelling Methods for Understanding the Reports of the Mining Projects in Queensland

Discovering Mental Health Research Topics with Topic Modeling

Clustering and Topic Modeling over Tweets: A Comparison over a Health Dataset