Fitting copulas in the case of missing data

Eckhard Liebscher
DOI: https://doi.org/10.1007/s00362-024-01535-3
2024-03-27
Statistical Papers
Abstract:Abstract In this paper we deal with parametric estimation of the copula in the case of missing data. The data items with the same pattern of complete and missing data are combined into a subset. This approach corresponds to the MCAR model for missing data. We construct a specific Cramér–von Mises statistic as a sum of such statistics for the several missing data patterns. The minimization of the statistic gives the estimators for the parameters. We prove asymptotic normality of the parameter estimators and of the Cramér–von Mises statistic.
statistics & probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to perform parameter estimation for copulas in the presence of missing data. Specifically, the author focuses on how to combine these data subsets to construct a specific Cramér - von Mises statistic when there are identical missing patterns in the data. This statistic is the sum of the Cramér - von Mises statistics for data subsets with different missing patterns. Parameter estimates are obtained by minimizing this statistic, and the asymptotic normality of these parameter estimates and the asymptotic properties of the Cramér - von Mises statistic itself are proven. ### Main Problems and Methods 1. **Problem Definition**: - The paper deals with the problem of copula parameter estimation in the case of missing data. - The data is divided into multiple subsets, each subset having the same missing pattern. 2. **Methods**: - A specific Cramér - von Mises statistic is constructed, which is a linear combination of the Cramér - von Mises statistics of subsets under different missing patterns. - Parameter estimates are obtained by minimizing this statistic. - The asymptotic normality of the parameter estimates and the asymptotic properties of the Cramér - von Mises statistic are proven. ### Key Assumptions - **MCAR Model**: The missingness of data is completely random, that is, the missingness of a data item is independent of its actual value. - **Compactness of Parameter Space**: The parameter space \(\Theta\) is compact. - **Continuity of Weight Function**: The weight function \(w_\mu\) is Lipschitz continuous. ### Main Results - **Consistency of Parameter Estimation**: Under appropriate assumptions, the parameter estimate \(\hat{\theta}_n\) converges to the true value \(\theta_0\) of the parameter with probability 1. - **Asymptotic Normality**: The asymptotic distribution of the parameter estimate \(\hat{\theta}_n\) is a normal distribution. - **Asymptotic Normality of Cramér - von Mises Statistic**: The asymptotic distribution of the Cramér - von Mises statistic \(\hat{D}_n(\hat{\theta}_n)\) is also a normal distribution. ### Application and Verification - **Simulation Study**: The effectiveness of the method is verified through a simulation study. The results show that under different sample sizes, the parameter estimates can reasonably approximate the true values. - **Application to Real - Data**: An empirical analysis is carried out using the data in the TRY plant trait database. The results show that the product copula model performs well in fitting ecological data. ### Formula Summary - **Cramér - von Mises Statistic**: \[ \hat{D}_n(\theta) = \sum_{\mu = 1}^m \frac{1}{n_\mu} \sum_{i = 1}^{n_\mu} \left( \hat{H}_{n\mu}(Y_{\mu i}) - C_\mu(\tilde{F}^*_{n\mu}(Y_{\mu i}) | \theta) \right)^2 w_\mu(\tilde{F}^*_{n\mu}(Y_{\mu i})) \] - **Asymptotic Normality**: \[ \sqrt{n} (\hat{\theta}_n - \theta_0) \xrightarrow{D} N(0, \Sigma) \] where \(\Sigma = H^{-1} \Sigma_D H^{-1}\), \(H\) is the Hessian matrix of \(\theta \mapsto D(C, C(\cdot | \theta))\) at \(\theta = \theta_0\), and \(\Sigma_D\) is the covariance matrix. Through these methods and results, the paper provides an effective way to perform copula parameter estimation in the presence of missing data.