Quality indices for topic model selection and evaluation: a literature review and case study

Christopher Meaney,Therese A. Stukel,Peter C. Austin,Rahim Moineddin,Michelle Greiver,Michael Escobar
DOI: https://doi.org/10.1186/s12911-023-02216-1
IF: 3.298
2023-07-24
BMC Medical Informatics and Decision Making
Abstract:Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus.
medical informatics
What problem does this paper attempt to address?