A fully Bayesian semi-parametric scalar-on-function regression (SoFR) with measurement error using instrumental variables

Roger S. Zoh,Yuanyuan Luan,Carmen Tekwe
DOI: https://doi.org/10.48550/arXiv.2202.00711
2022-11-09
Abstract:Wearable devices such as the ActiGraph are now commonly used in health studies to monitor or track physical activity. This trend aligns well with the growing need to accurately assess the effects of physical activity on health outcomes such as obesity. When accessing the association between these device-based physical activity measures with health outcomes such as body mass index, the device-based data is considered functions, while the outcome is a scalar-valued. The regression model applied in these settings is the scalar-on-function regression (SoFR). Most estimation approaches in SoFR assume that the functional covariates are precisely observed, or the measurement errors are considered random errors. Violation of this assumption can lead to both under-estimation of the model parameters and sub-optimal analysis. The literature on a measurement corrected approach in SoFR is sparse in the non-Bayesian literature and virtually non-existent in the Bayesian literature. This paper considers a fully nonparametric Bayesian measurement error corrected SoFR model that relaxes all the constraining assumptions often made in these models. Our estimation relies on an instrumental variable (IV) to identify the measurement error model. Finally, we introduce an IV quality scalar parameter that is jointly estimated along with all model parameters. Our method is easy to implement, and we demonstrate its finite sample properties through an extensive simulation. Finally, the developed methods are applied to the National Health and Examination Survey to assess the relationship between wearable-device-based measures of physical activity and body mass index among adults living in the United States.
Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively correct the influence of measurement errors on functional covariates in the evaluation of the relationship between physical activity data monitored by wearable devices (such as ActiGraph) and health outcomes (for example, body mass index, BMI). Specifically, the author proposes a fully Bayesian semi - parametric scalar - function regression (SoFR) model. This model can handle measurement errors in functional covariates and does not require the assumption that the instrumental variable (IV) is unbiased. This method is especially suitable for high - dimensional longitudinal data or functional data with complex heterogeneous covariance structures, which are usually generated by frequently - recorded wearable devices. ### Background and Problem of the Paper With the wide application of wearable devices in research, researchers increasingly need to accurately assess the relationship between physical activity and health outcomes (such as obesity). The physical activity measurements provided by these devices are best regarded as functions in order to evaluate their association with numerical outcomes (such as body mass index, BMI). Scalar - function regression (SoFR) is a regression model suitable for this setting. However, most SoFR estimation methods assume that the measurement error in the functional covariate is white noise. Violation of this assumption may lead to underestimation of model parameters. Currently, among the Bayesian methods for such problems, there are almost no methods to solve the measurement error. ### Solution The paper proposes a brand - new Bayesian semi - parametric measurement error - corrected SoFR model, which relaxes all the restrictive assumptions in previous models. The key points of this method include: 1. **Bayesian Framework**: Adopting a fully Bayesian method, allowing any measurement error distribution, eliminating the need for two - stage estimation, and being able to automatically and accurately quantify the uncertainty of parameter estimation. 2. **Instrumental Variable**: Using functional - value instrumental variables (IV) to identify the model, but removing the assumption that the IV is unbiased, allowing the IV to have a time - varying bias factor. 3. **Model Flexibility**: Through the truncated Dirichlet process mixture (tDPM) prior, allowing the distribution of the error term to be asymmetric, thereby improving the model flexibility. ### Application Example In the paper, the author applies this method to the data of the National Health and Nutrition Examination Survey (NHANES) to evaluate the relationship between the physical activity intensity based on wearable devices and the BMI of American adults. The results show that this method performs well in estimating the functional parameter β(t) and can reveal the potential association between physical activity patterns and BMI. ### Summary In general, this paper proposes a new Bayesian semi - parametric SoFR model, which can effectively handle measurement errors in functional covariates, especially when the instrumental variable has a time - varying bias factor. Through extensive simulations and applications to real - data, the effectiveness and superiority of this method are verified. This method provides a new tool for handling complex high - dimensional longitudinal data and helps to more accurately assess the relationship between physical activity and health outcomes.