StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables

Sacha Morin,Robin Legault,Félix Laliberté,Zsuzsa Bakk,Charles-Édouard Giguère,Roxane de la Sablonnière,Éric Lacourse
DOI: https://doi.org/10.48550/arXiv.2304.03853
2024-06-17
Abstract:StepMix is an open-source Python package for the pseudo-likelihood estimation (one-, two- and three-step approaches) of generalized finite mixture models (latent profile and latent class analysis) with external variables (covariates and distal outcomes). In many applications in social sciences, the main objective is not only to cluster individuals into latent classes, but also to use these classes to develop more complex statistical models. These models generally divide into a measurement model that relates the latent classes to observed indicators, and a structural model that relates covariates and outcome variables to the latent classes. The measurement and structural models can be estimated jointly using the so-called one-step approach or sequentially using stepwise methods, which present significant advantages for practitioners regarding the interpretability of the estimated latent classes. In addition to the one-step approach, StepMix implements the most important stepwise estimation methods from the literature, including the bias-adjusted three-step methods with Bolk-Croon-Hagenaars and maximum likelihood corrections and the more recent two-step approach. These pseudo-likelihood estimators are presented in this paper under a unified framework as specific expectation-maximization subroutines. To facilitate and promote their adoption among the data science community, StepMix follows the object-oriented design of the scikit-learn library and provides an additional R wrapper.
Methodology,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in social science research, how to use the Python package StepMix to achieve pseudo - likelihood estimation of generalized mixture models with external variables (covariates and distal outcomes). Specifically, the goals of the paper include: 1. **Provide an open - source tool**: Provide researchers with an easy - to - use Python package StepMix for pseudo - likelihood estimation of generalized mixture models with external variables. 2. **Support multiple estimation methods**: StepMix not only implements the classic one - step estimation method, but also supports important step - by - step estimation methods, such as the three - step method with bias correction (BCH method and maximum - likelihood correction) and the relatively new two - step method. 3. **Improve model interpretability**: The step - by - step estimation method avoids the influence of distal outcomes on the definition of latent classes, thereby improving the interpretability of latent classes. 4. **Handle missing data**: StepMix supports the handling of missing data in indicators and distal outcomes through the full - information maximum - likelihood (FIML) method. 5. **Promote the adoption of the latest estimation methods**: By providing a Python package that follows the scikit - learn interface, more researchers can use the most advanced estimation methods. ### Specific problem description In social science research, mixture models are often used to analyze multivariate continuous and categorical data and discover hidden subgroups in the population. However, traditional estimation methods have some limitations, for example: - The one - step method directly maximizes the joint - likelihood function, which may lead to difficult - to - interpret latent classes. - The naive three - step method is prone to bias, especially in the case of misclassification. - There is a lack of step - by - step estimation methods with bias correction in open - source software. Therefore, this paper aims to solve these problems by developing the StepMix package, providing a more flexible and accurate estimation method, especially when dealing with complex models with covariates and distal outcomes. ### Summary The main goal of the paper is to introduce the StepMix package and provide a self - contained reference for pseudo - likelihood estimation of mixture models with external variables. By implementing the step - by - step estimation method with bias correction, StepMix aims to improve the interpretability of latent classes, handle missing data at the same time, and provide an easy - to - use open - source tool for researchers.