Estimating the size of a closed population by modeling latent and observed heterogeneity

Francesco Bartolucci,Antonio Forcina
DOI: https://doi.org/10.1093/biomtc/ujae017
IF: 1.701
2024-03-27
Biometrics
Abstract:Abstract The paper extends the empirical likelihood (EL) approach of Liu et al. to a new and very flexible family of latent class models for capture-recapture data also allowing for serial dependence on previous capture history, conditionally on latent type and covariates. The EL approach allows to estimate the overall population size directly rather than by adding estimates conditional to covariate configurations. A Fisher-scoring algorithm for maximum likelihood estimation is proposed and a more efficient alternative to the traditional EL approach for estimating the non-parametric component is introduced; this allows us to show that the mapping between the non-parametric distribution of the covariates and the probabilities of being never captured is one-to-one and strictly increasing. Asymptotic results are outlined, and a procedure for constructing profile likelihood confidence intervals for the population size is presented. Two examples based on real data are used to illustrate the proposed approach and a simulation study indicates that, when estimating the overall undercount, the method proposed here is substantially more efficient than the one based on conditional maximum likelihood estimation, especially when the sample size is not sufficiently large.
statistics & probability,mathematical & computational biology,biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to estimate the overall size of a closed population, especially when there is unobserved heterogeneity in the data. Specifically, the author extends the empirical likelihood (EL) - based method proposed by Liu et al. to handle latent class models in capture - recapture data and allows the capture probability to depend on previous capture history, covariates, and latent classes. ### Main problems and methods 1. **Estimating the overall size of a closed population**: - The goal of the paper is to estimate the overall size of a closed population more accurately in the presence of unobserved heterogeneity and covariates. - Traditional methods such as conditional maximum likelihood estimation (CML) and unconditional maximum likelihood estimation (UML) perform poorly when the sample size is insufficient, while the EL method can estimate the overall size more directly. 2. **Introducing latent class models**: - The author introduces latent class models, which allow the capture probability to depend on latent classes and covariates. - These latent classes can explain the heterogeneity among individuals, thereby improving the accuracy of the estimate. 3. **Considering the impact of capture history**: - The capture probability depends not only on the current capture event but may also be affected by the previous capture history. - The author proposes recursive logistic parametrization to model this dependency. 4. **Improved empirical likelihood method**: - A new EL method for estimating the non - parametric part is proposed, which is more computationally direct and efficient. - The new method reveals the relationship between the marginal distribution of covariates and the probability of never being captured, and provides a powerful tool to distinguish the differences in the profile likelihood function with respect to the marginal distribution of covariates. ### Mathematical formulas - **Estimation of overall size**: \[ \hat{N}_i^{(c)}=\frac{n_i}{1 - \hat{\varphi}_i^{(c)}} \] where \(\hat{\varphi}_i^{(c)}\) is the CML estimate of the probability that an individual in the \(i\) - th stratum has never been captured. - **Log - likelihood function**: \[ L(N, \beta, \tau)=\log \Gamma(N + 1)-\log \Gamma(N - n + 1)+(N - n)\log(\varphi)+\sum_{i = 1}^s\left[y_i'\log(p_i)+n_i\log(\tau_i)\right] \] - **Update equation**: \[ \tau^{(u + 1)}=\frac{1}{N}\left[n+(N - n)\text{diag}(\varphi^{(u)})\tau^{(u)}\right] \] Through these methods, the paper aims to provide a more flexible and accurate framework for estimating the overall size of a closed population, especially in the presence of unobserved heterogeneity and covariates.