Abstract:Missing values or incomplete data are commonly encountered in clinical research and are studied by many authors. Basically, the causes of missing values in a study can be classified into two categories. The first category includes the reasons that are not directly related to the study. For example, a patient may be lost to follow-up because he/she moves out of the area. This category of missing values can be considered as missing completely at random. The second category includes the reasons that are related to the study. For example, a patient may withdraw from the study due to treatment-emergent adverse events. In practice, it is not uncommon to have multiple assessments from each subject. Subjects with all observations missing are called unit nonrespondents. Because unit nonrespondents do not provide any useful information, these subjects are usually excluded from the analysis. On the other hand, the subjects with some, but not all, observations missing are referred to as item nonrespondents. In practice, excluding item nonrespondents from the analysis is considered against the intent-to-treat (ITT) principle and, hence, not acceptable. In clinical research, the primary analysis is usually conducted based on ITT population, which includes all randomized subjects with at least posttreatment evaluation. As a result, most item nonrespondents may be included in the ITT population. In practice, excluding item nonrespondents may seriously decrease power/efficiency of the study. To account for item nonrespondents, two methods are commonly considered. The first method is the so-called likelihood-based method. Under a parametric model, the marginal likelihood function for the observed responses is obtained by integrating out the missing responses. The parameter of interest can then be estimated by the maximum likelihood estimator (MLE). Consequently, a corresponding test (e.g., likelihood ratio test) can be constructed. The merit of this method is that the resulting statistical procedures are usually efficient. The drawback is that the calculation of the marginal likelihood could be difficult. As a result, some special statistical or numerical algorithms are commonly applied for obtaining the MLE. For example, the expectation–maximization (EM) algorithm is one of the most popular methods for obtaining the MLE when there are missing data. The other method for item nonrespondents is imputation. Compared with the likelihood-based method, the method of imputation is relatively simple and easy to apply. The idea of imputation is to treat the imputed values as the observed values and then apply the standard statistical software for obtaining consistent estimators. However, it should be noted that the variability of the estimator obtained by imputation is usually different from the estimator obtained from the complete data. In this case, the formulas designed to estimate the variance of the complete data set cannot be used to estimate the variance of estimator produced by the imputed data. As an alternative, two methods are considered for estimation of its variability. One is based on Taylor’s expansion. This method is referred to as the ‘‘linearization method.’’ The merit of the linearization method is that it requires less computation. However, the drawback is that its formula could be very complicated and/or nontrackable. The other approach is based on resampling method (e.g., bootstrap and jackknife). The drawback of the resampling method is that it requires an intensive computation. The merit is that it is very easy to apply. With the help of a fast-speed computer, the resampling method has become much more attractive in practice. Note that imputation is not only popular in clinical research, it is also very popular in many other statistical fields such as sample survey. However, the imputation methods in clinical research are more diversified due to the complexity of the study design relative to sample survey. As a result, the statistical properties of many commonly used imputation methods in clinical research are still unknown, while most imputation methods used in sample survey are well studied. Hence, the imputation methods in clinical research provide a unique challenge and also an opportunity for the statisticians in the area of clinical research. In what follows, we will summarize the most commonly used imputation methods and investigate their statistical properties. Recent development will also be discussed.

Imputation for Lipidomics and Metabolomics (ImpLiMet): Online application for optimization and method selection for missing data imputation

GSimp: A Gibbs Sampler Based Left-Censored Missing Value Imputation Approach for Metabolomics Studies

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

imputomics: web server and R package for missing values imputation in metabolomics data

Imputation of missing values in lipidomic datasets

Omicsmic: a Comprehensive Benchmarking Platform for Robust Comparison of Imputation Methods in Mass Spectrometry-based Omics Data.

Estimation and inference in metabolomics with non-random missing data and latent factors

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies

Missing Data Imputation: Focusing on Single Imputation.

A Gibbs sampler based left-censored missing value 1 imputation approach for metabolomics studies

Imputation methods for mixed datasets in bioarchaeology

CHOOSING APPROPRIATE IMPUTATION METHODS FOR MISSING DATA: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

Imputation of plasma lipid species to facilitate integration of lipidomic datasets

Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies

methyLImp2: faster missing value estimation for DNA methylation data

Missing value imputation in high-dimensional phenomic data: imputable or not, and how?

Missing data imputation using a truncated infinite factor model with application to metabolomics data

Using statistical techniques and replication samples for imputation of metabolite missing values

To impute or not to impute in untargeted metabolomics - that is the compositional question

Imputation in Clinical Research

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods