Abstract:In longitudinal studies, missing data are common. The missing not at random (MNAR) data may lead to biasd parameter estimates and even distort the results of analyses. In this article we compared two techniques based on different mechanisms [i.e., the maximum likelihood approach based on the Missing at Random (MAR) mechanism and the Diggle-Kenward selection model based on the MNAR mechanism] for handling different types of missing data using the Monte Carlo simulation method. Estimates of parameters and standard errors using each of these methods were contrasted under different model assumptions. Four possible influential factors were considered: the dropout missingness proportions, the sample size, the distribution shape (i.e., skewness and kurtosis), and the missing mechanisms. The results indicated that (1) The Diggle-Kenward selection model were affected less by the missingness mechanism than the ML approach. At the MAR condition, the Diggle-Kenward selection model based on the MNAR mechanism kept stable and would provide similar estimation results with the ML approach based on the MAR assumption. At the MNAR condition, the ML approach was not much different from the Diggle-Kenward selection model in their variance of latent variances (σi2 and σs2) but had greater discrepancy in their means of the latent variables (μi and μs). (2) The distribution shape had more impact on the Diggle-Kenward selection model. For the mean and variance of the intercept and the variance of the slope, the sample size and the degrees of skewness and kurtosis had significant interactions. With large sample sizes, the influence of distribution shape on the estimation precision would decrease. The ML approach was not easily affected by the distribution shape. (3) When fitting a growth curve model, compared to the means of the latent variables (μi and μs), the variances (σi2 and σs2) were influenced much more by the distribution shape (i.e., the degree of skewness and kurtosis). (4) The level of dropout missingness proportion was the major factor affecting the parameter estimation precision. Greater sample size would improve the estimation precision in most cases.

The More Data, the Better? Demystifying Deletion-Based Methods in Linear Regression with Missing Data

Modeling of Correlated Cognitive Function and Functional Disability Outcomes with Bounded and Missing Data in a Longitudinal Aging Study

Handling Nonmonotone Missing Data with Available Complete-Case Missing Value Assumption

Impact of Missing Data on Correlation Coefficient Values: Deletion and Imputation Methods for Data Preparation

THE INFLUENCE OF DELETING DATA ON MISSING OBSERVATIONS MODEL

Case-Deletion Diagnostics for Linear Mixed Models.

Deletion Diagnostics for Nonparametric Mixed Models

A unified framework of analyzing missing data and variable selection using regularized likelihood

Regression Analysis with Individual-Specific Patterns of Missing Covariates

Extending the DeLong algorithm for comparing areas under correlated receiver operating characteristic curves with missing data

Missing Values in Big Data Research: Some Basic Skills

The Analysis of Social-Science Data with Missing Values

Missing Data Imputation: Focusing on Single Imputation.

The analysis of social science data with missing values

Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach

LGM-based Analyses with Missing Data:Comparison Between ML Method and Diggle-Kenward Selection Model

Data Deletion for Linear Regression with Noisy SGD

Comparison of Maximum Likelihood Approach, Diggle-Kenward Selection Model, Pattern Mixture Model with MAR and MNAR Dropout Data.

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Missing data approaches for longitudinal neuroimaging research: Examples from the Adolescent Brain and Cognitive Development (ABCD) Study

Rejoinder: Statistical Inference for Non-Ignorable Missing-Data Problems: a Selective Review