Visualizing Topic Uncertainty in Topic Modelling

Peter Winker
DOI: https://doi.org/10.48550/arXiv.2302.06482
2023-02-13
Abstract:Word clouds became a standard tool for presenting results of natural language processing methods such as topic modelling. They exhibit most important words, where word size is often chosen proportional to the relevance of words within a topic. In the latent Dirichlet allocation (LDA) model, word clouds are graphical presentations of a vector of weights for words within a topic. These vectors are the result of a statistical procedure based on a specific corpus. Therefore, they are subject to uncertainty coming from different sources as sample selection, random components in the optimization algorithm, or parameter settings. A novel approach for presenting word clouds including information on such types of uncertainty is introduced and illustrated with an application of the LDA model to conference abstracts.
Computation
What problem does this paper attempt to address?