Fast variable selection for distributional regression with application to continuous glucose monitoring data

Alexander Coulter,Rashmi N. Aurora,Naresh M. Punjabi,Irina Gaynanova
2024-03-02
Abstract:With the growing prevalence of diabetes and the associated public health burden, it is crucial to identify modifiable factors that could improve patients' glycemic control. In this work, we seek to examine associations between medication usage, concurrent comorbidities, and glycemic control, utilizing data from continuous glucose monitor (CGMs). CGMs provide interstitial glucose measurements, but reducing data to simple statistical summaries is common in clinical studies, resulting in substantial information loss. Recent advancements in the Frechet regression framework allow to utilize more information by treating the full distributional representation of CGM data as the response, while sparsity regularization enables variable selection. However, the methodology does not scale to large datasets. Crucially, variable selection inference using subsampling methods is computationally infeasible. We develop a new algorithm for sparse distributional regression by deriving a new explicit characterization of the gradient and Hessian of the underlying objective function, while also utilizing rotations on the sphere to perform feasible updates. The updated method is up to 10000-fold faster than the original approach, opening the door for applying sparse distributional regression to large-scale datasets and enabling previously unattainable subsampling-based inference. Applying our method to CGM data from patients with type 2 diabetes and obstructive sleep apnea, we found a significant association between sulfonylurea medication and glucose variability without evidence of association with glucose mean. We also found that overnight oxygen desaturation variability showed a stronger association with glucose regulation than overall oxygen desaturation levels.
Applications
What problem does this paper attempt to address?
The paper attempts to address the following issues: 1. **High-dimensional distribution regression problem**: The paper develops a new algorithm for sparse distribution regression, making it applicable to large-scale datasets. Existing methods face computational bottlenecks when handling large datasets. The new algorithm significantly improves computational efficiency by optimizing the explicit expressions of the gradient and Hessian matrix, making it over 10,000 times faster than existing methods. 2. **Variable selection and inference**: By combining the stability selection method, it makes variable selection inference possible. This was not achievable with the original algorithm due to high computational costs. 3. **Application to real data**: Specifically applied to continuous glucose monitoring (CGM) data, exploring the relationship between factors such as medication use, obstructive sleep apnea (OSA), and blood glucose control. The study found: - There is a significant association between sulfonylurea drugs and blood glucose variability, but not with average blood glucose levels. - The variability in nocturnal blood oxygen saturation has a stronger relationship with blood glucose regulation than the overall blood oxygen saturation level. Through these methods, the paper aims to improve the understanding of blood glucose control in diabetic patients and provides tools for future large-scale studies.