Abstract:Missing values or incomplete data are commonly encountered in clinical research and are studied by many authors. Basically, the causes of missing values in a study can be classified into two categories. The first category includes the reasons that are not directly related to the study. For example, a patient may be lost to follow-up because he/she moves out of the area. This category of missing values can be considered as missing completely at random. The second category includes the reasons that are related to the study. For example, a patient may withdraw from the study due to treatment-emergent adverse events. In practice, it is not uncommon to have multiple assessments from each subject. Subjects with all observations missing are called unit nonrespondents. Because unit nonrespondents do not provide any useful information, these subjects are usually excluded from the analysis. On the other hand, the subjects with some, but not all, observations missing are referred to as item nonrespondents. In practice, excluding item nonrespondents from the analysis is considered against the intent-to-treat (ITT) principle and, hence, not acceptable. In clinical research, the primary analysis is usually conducted based on ITT population, which includes all randomized subjects with at least posttreatment evaluation. As a result, most item nonrespondents may be included in the ITT population. In practice, excluding item nonrespondents may seriously decrease power/efficiency of the study. To account for item nonrespondents, two methods are commonly considered. The first method is the so-called likelihood-based method. Under a parametric model, the marginal likelihood function for the observed responses is obtained by integrating out the missing responses. The parameter of interest can then be estimated by the maximum likelihood estimator (MLE). Consequently, a corresponding test (e.g., likelihood ratio test) can be constructed. The merit of this method is that the resulting statistical procedures are usually efficient. The drawback is that the calculation of the marginal likelihood could be difficult. As a result, some special statistical or numerical algorithms are commonly applied for obtaining the MLE. For example, the expectation–maximization (EM) algorithm is one of the most popular methods for obtaining the MLE when there are missing data. The other method for item nonrespondents is imputation. Compared with the likelihood-based method, the method of imputation is relatively simple and easy to apply. The idea of imputation is to treat the imputed values as the observed values and then apply the standard statistical software for obtaining consistent estimators. However, it should be noted that the variability of the estimator obtained by imputation is usually different from the estimator obtained from the complete data. In this case, the formulas designed to estimate the variance of the complete data set cannot be used to estimate the variance of estimator produced by the imputed data. As an alternative, two methods are considered for estimation of its variability. One is based on Taylor’s expansion. This method is referred to as the ‘‘linearization method.’’ The merit of the linearization method is that it requires less computation. However, the drawback is that its formula could be very complicated and/or nontrackable. The other approach is based on resampling method (e.g., bootstrap and jackknife). The drawback of the resampling method is that it requires an intensive computation. The merit is that it is very easy to apply. With the help of a fast-speed computer, the resampling method has become much more attractive in practice. Note that imputation is not only popular in clinical research, it is also very popular in many other statistical fields such as sample survey. However, the imputation methods in clinical research are more diversified due to the complexity of the study design relative to sample survey. As a result, the statistical properties of many commonly used imputation methods in clinical research are still unknown, while most imputation methods used in sample survey are well studied. Hence, the imputation methods in clinical research provide a unique challenge and also an opportunity for the statisticians in the area of clinical research. In what follows, we will summarize the most commonly used imputation methods and investigate their statistical properties. Recent development will also be discussed.

To Impute or not to Impute? Missing Data in Treatment Effect Estimation

Missing Data Imputation: Focusing on Single Imputation.

Generalizing treatment effects with incomplete covariates: Identifying assumptions and multiple imputation algorithms

Missing Data Imputation in Balanced Construction for Incomplete Block Designs

Imputation in Clinical Research

Meta‐Analysis of Studies with Missing Data

Missing Data Imputation for a Multivariate Outcome of Mixed Variable Types

Missing value imputation in high-dimensional phenomic data: imputable or not, and how?

How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

Doubly robust treatment effect estimation with missing attributes

Imputation methods for mixed datasets in bioarchaeology

Estimation of treatment policy estimands for continuous outcomes using off treatment sequential multiple imputation

Series 2-19-2009 Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials

Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies

CHOOSING APPROPRIATE IMPUTATION METHODS FOR MISSING DATA: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation

Imputation and Missing Indicators for handling missing data in the development and implementation of clinical prediction models: a simulation study

Multiple Imputation for Incomplete Data in Epidemiologic Studies

Multiple Imputation When Variables Exceed Observations: An Overview of Challenges and Solutions

All-or-Nothing Transform and Remotely Keyed Encription Protocols

Leveraging Random Assignment to Impute Missing Covariates in Causal Studies