Abstract:Introduction The National Health and Nutrition Examination Survey (NHANES) is a periodic survey conducted by the National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention. The NHANES is designed to provide national estimates of the health and nutritional status of the civilian noninstitutionalizedpopulation. Sociodemographic and medical history information are obtained through household interviews, while physical measurements, physiological tests, and biochemical measurements are collected through standardized physical examinations in mobile examination centers (MECs). The on-going Third NHANES or NHANES 111 is the seventh of an extensive series of periodic health and nutrition surveys that NCHS has conducted since 1960. The current NHANES HI, with a sample of approximately 40,000 sample persons 2 months of age and older, has been divided into two 3-year national samples. Phase 1 was conducted from October 1988 to October 1991 while Phase 2 will continue until October 1994. NHANES 111 is based on a complex, multistage area probability sample design and includes an oversample of children under 5 years of age, older Americans aged 60+ years, and both black and Mexican-American persons. Details of the sample design of NHANES 111 have been previously published (1). NHANES 111, like most sample surveys, experiences both total (unit) nonresponse and item nonresponse. The missing data problem for NHANES III is somewhat unique since sample persons can refuse to participate at three different stages of the data collection. Unit nonresponse rates for NHANES HIPhase 1 ranged from 0% for the screening interview (with about 7% of the screening data obtained from neighbors) to 14 % for the household interview to 22 % for the physical examination. It is common survey practice to compensate for unit nonresponse through weighting class adjustments (2-5). The adjustments to reduce potential nonresponse bias for NHANES IIIPhase 1 have been previously described (6). In addition to unit nonresponse, various levels of item nonresponse occur in NHANES HI. In Phase 1, item nonresponse of 1-5% occurred for the household interview questions. In addition, some components of the physical examination were not successfully completed for all sample persons. Furthermore, some examination components include a number of individual measurements (e.g., body measurements)--some of which may be missing. Item nonresponse rates for the individual components ranged from 5-8 %. Generally, item nonresponse is handled by some type of imputation. Imputation methods fill in missing items with values from similar units in the dataset or with predicted values obtained from a model, thus making it possible to analyze the data as if it were complete. Some common methods of imputation used in surveys include deductive imputation, mean imputation, Hot Deck imputation, Cold Deck imputation, regression imputation, stochastic regression, multiple imputation, and composite imputation methods (7). Each of these imputation methods has relative advantages and disadvantages. The method of choice for a survey may depend upon particular circumstances including the type of survey data and availability of computer hardware and software. In addition to allowing complete data methods of analysis, multiple imputation allows one to assess the impact of missing data uncertainty on the variances and to revise estimates of variance to reflect the additional uncertainty (8). In previous NHANES surveys, imputation for item nonresponse was done on an ad hoc basis. The purpose of this paper is to describe research conducted to compare alternative missing data adjustment methods for selected survey components in NHANES 111Phase 1 based on single and multiple imputation methodology. The information contained in this paper, in part, is based on a special project carded out during 1992 and contained in a f'mal report by Datametrics Research, Inc. (9).

Predicting blood pressure under circumstances of missing data: An analysis of missing data patterns and imputation methods using NHANES

Use of Sequential Hot-Deck Imputation for Missing Health Care Systems Data for Population Health Research

A comparison of imputation techniques in the third national health and nutrition examination survey

Imputation of missing values for electronic health record laboratory data

Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset

Missing Values in Big Data Research: Some Basic Skills

Missing Data Statistics Provide Causal Insights into Data Loss in Diabetes Health Monitoring by Wearable Sensors

Use of Censored Data Methods to Estimate Blood Pressure Percentiles in US Adults

Analysis of Missingness Scenarios for Observational Health Data

Studies on snake venom. XIII. Chromatographic separation and properties of three proteinases from Agkistrodon halys blomhoffii venom.

Matrix Completion for Survey Data Prediction with Multivariate Missingness

The Impact of Missing Continuous Blood Glucose Samples on Machine Learning Models for Predicting Postprandial Hypoglycemia: An Experimental Analysis

Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study

Strategies for handling missing data that improve Frailty Index estimation and predictive power: lessons from the NHANES dataset

Exploring Predictive Methods for Cardiovascular Disease: A Survey of Methods and Applications

CHOOSING APPROPRIATE IMPUTATION METHODS FOR MISSING DATA: A DECISION ALGORITHM ON METHODS FOR MISSING DATA

Multiple Imputation for Incomplete Data in Epidemiologic Studies

A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative

Assessing the Impact of Imputation on the Interpretations of Prediction Models: A Case Study on Mortality Prediction for Patients with Acute Myocardial Infarction.

19 Incomplete Data in Epidemiology and Medical Statistics

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation