nhanesA: achieving transparency and reproducibility in NHANES research

Laha Ale,Robert Gentleman,Teresa Filshtein Sonmez,Deepayan Sarkar,Christopher Endres
DOI: https://doi.org/10.1093/database/baae028
2024-01-01
Database
Abstract:Abstract The National Health and Nutrition Examination Survey provides comprehensive data on demographics, sociology, health and nutrition. Conducted in 2-year cycles since 1999, most of its data are publicly accessible, making it pivotal for research areas like studying social determinants of health or tracking trends in health metrics such as obesity or diabetes. Assembling the data and analyzing it presents a number of technical and analytic challenges. This paper introduces the nhanesA R package, which is designed to assist researchers in data retrieval and analysis and to enable the sharing and extension of prior research efforts. We believe that fostering community-driven activity in data reproducibility and sharing of analytic methods will greatly benefit the scientific community and propel scientific advancements. Database URL: https://github.com/cjendres1/nhanes
mathematical & computational biology
What problem does this paper attempt to address?
The main objective of this paper is to address the technical and analytical challenges encountered when using the National Health and Nutrition Examination Survey (NHANES) data by introducing the `nhanesA` R package. The NHANES dataset covers extensive health and nutrition information about the population, which is crucial for studying social determinants of health and tracking trends in health indicators such as obesity or diabetes. However, assembling and analyzing these data present numerous technical difficulties. To promote research transparency and reproducibility, the authors developed the `nhanesA` package, which aims to assist researchers in retrieving and analyzing NHANES data, and facilitate the sharing and extension of previous research findings. Specifically, the package provides various functions, such as searching for relevant variables and data files, downloading data to local machines, aligning tables within cycles, and aligning across cycles. Additionally, the paper emphasizes the importance of correctly using survey weights to obtain valid estimates and discusses some challenges in NHANES data, such as missing data and data coarsening issues. Through these tools and methods, the authors hope to advance community-driven data reproducibility and the sharing of analytical approaches, thereby significantly promoting scientific research progress.