LDAExplore: Visualizing Topic Models Generated Using Latent Dirichlet Allocation

Ashwinkumar Ganesan,Kiante Brantley,Shimei Pan,Jian Chen
DOI: https://doi.org/10.48550/arXiv.1507.06593
2015-07-24
Abstract:We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.
Information Retrieval,Human-Computer Interaction
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Understanding the generated topics**: Topics generated using topic - modeling methods (such as LDA) may be difficult for users to understand. Users may not know the specific meanings of these topics, especially without relevant background knowledge. 2. **Searching for relevant topics and documents**: Users may encounter difficulties when looking for relevant topics and related documents. Traditional topic - modeling methods lack effective visualization tools, making it difficult for users to intuitively understand and explore the relationships between topics and the associations between documents and topics. To solve these problems, the paper proposes a tool named **LDAExplore**. LDAExplore improves the user experience in the following ways: - **Visualizing topic distribution**: LDAExplore can visualize the topic distribution generated from the document corpus, enabling users to intuitively see the word distribution of each topic. - **Interactive exploration**: Users can explore the relationships between topics and documents through interaction with the visualization, thus better understanding the generated topics. - **Keyword search**: It provides a keyword search function to help users quickly find documents containing specific topics. - **Filtering and screening**: It allows users to filter and screen documents according to different criteria (such as topic ranking, keywords, etc.) in order to more effectively find relevant information. Through these functions, LDAExplore aims to help users more easily understand the topics generated by LDA and be able to search and organize relevant documents more efficiently.