Abstract:Importance: Scales often arise from multi-item questionnaires, yet commonly face item non-response. Traditional solutions use weighted mean (WMean) from available responses, but potentially overlook missing data intricacies. Advanced methods like multiple imputation (MI) address broader missing data, but demand increased computational resources. Researchers frequently use survey data in the All of Us Research Program (All of Us), and it is imperative to determine if the increased computational burden of employing MI to handle non-response is justifiable. Objectives: Using the 5-item Physical Activity Neighborhood Environment Scale (PANES) in All of Us, this study assessed the tradeoff between efficacy and computational demands of WMean, MI, and inverse probability weighting (IPW) when dealing with item non-response. Materials and methods: Synthetic missingness, allowing 1 or more item non-response, was introduced into PANES across 3 missing mechanisms and various missing percentages (10%-50%). Each scenario compared WMean of complete questions, MI, and IPW on bias, variability, coverage probability, and computation time. Results: All methods showed minimal biases (all <5.5%) for good internal consistency, with WMean suffered most with poor consistency. IPW showed considerable variability with increasing missing percentage. MI required significantly more computational resources, taking >8000 and >100 times longer than WMean and IPW in full data analysis, respectively. Discussion and conclusion: The marginal performance advantages of MI for item non-response in highly reliable scales do not warrant its escalated cloud computational burden in All of Us, particularly when coupled with computationally demanding post-imputation analyses. Researchers using survey scales with low missingness could utilize WMean to reduce computing burden.

Multiple imputation of multilevel missing data: An introduction to the R package pan

Multiple Imputation with Multivariate Imputation by Chained Equation (mice) Package

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence

Multiple imputation for longitudinal data: A tutorial

Missing Data Imputation: Focusing on Single Imputation.

Multiple Imputation for Multilevel Data with Continuous and Binary Variables

Implementing multiple imputations for addressing missing data in multireader multicase design studies

Missing Data Exploration: Highlighting Graphical Presentation of Missing Pattern.

Multiple Imputation with Factor Scores: A Practical Approach for Handling Simultaneous Missingness Across Items in Longitudinal Designs

Adapting tree-based multiple imputation methods for multi-level data? A simulation study

Dealing with missing data in multi-informant studies: A comparison of approaches

Solving the "many variables" problem in MICE with principal component regression

Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines

Imputation methods for mixed datasets in bioarchaeology

Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies

Multiple Imputation with Diagnostics ( Mi ) in R : Opening Windows into the Black Box

A short proof of Kuratowski's graph planarity criterion

Multiple Imputation When Variables Exceed Observations: An Overview of Challenges and Solutions

Imputation of Mixed Data With Multilevel Singular Value Decomposition

AN OVERVIEW OF MULTIPLE IMPUTATION

Balancing efficacy and computational burden: weighted mean, multiple imputation, and inverse probability weighting methods for item non-response in reliable scales