Finding mixed memberships in categorical data

Huan Qing
DOI: https://doi.org/10.1016/j.ins.2024.120785
2024-06-05
Abstract:Latent class analysis, a fundamental problem in categorical data analysis, often encounters overlapping latent classes that introduce further challenges. This paper presents a solution to this problem by focusing on finding latent mixed memberships of subjects in categorical data with polytomous responses. We employ the Grade of Membership (GoM) model, which assigns each subject a membership score in each latent class. To address this, we propose two efficient spectral algorithms for estimating these mixed memberships and other GoM parameters. Our algorithms are based on the singular value decomposition of a regularized Laplacian matrix. We establish their convergence rates under a mild condition on data sparsity. Additionally, we introduce a metric to evaluate the quality of estimated mixed memberships for real-world categorical data and determine the optimal number of latent classes based on this metric. Finally, we demonstrate the practicality of our methods through experiments on both computer-generated and real-world categorical datasets.
Social and Information Networks
What problem does this paper attempt to address?