Abstract:This work considers the problem of fitting functional models with sparsely and irregularly sampled functional data. It overcomes the limitations of the state-of-the-art methods, which face major challenges in the fitting of more complex non-linear models. Currently, many of these models cannot be consistently estimated unless the number of observed points per curve grows sufficiently quickly with the sample size, whereas, we show numerically that a modified approach with more modern multiple imputation methods can produce better estimates in general. We also propose a new imputation approach that combines the ideas of {\it MissForest} with {\it Local Linear Forest} and compare their performance with {\it PACE} and several other multivariate multiple imputation methods. This work is motivated by a longitudinal study on smoking cessation, in which the Electronic Health Records (EHR) from Penn State PaTH to Health allow for the collection of a great deal of data, with highly variable sampling. To illustrate our approach, we explore the relation between relapse and diastolic blood pressure. We also consider a variety of simulation schemes with varying levels of sparsity to validate our methods.

What problem does this paper attempt to address?

This paper focuses on the problem of fitting complex nonlinear models when dealing with sparse and irregularly sampled functional data. Current methods face challenges in estimating these models because consistent estimates cannot be obtained unless the number of observed points on each curve grows fast enough with sample size. The paper proposes an improved multiple imputation method that combines the ideas of MissForest (random forest) and local linear forests to enhance estimation accuracy. Additionally, a new imputation method is proposed and compared with existing methods such as MissForest and MICE (Multivariate Imputation by Chained Equations). The paper points out that single imputation methods, such as mean imputation or PACE (Principal-based imputation for Covariance Estimation), while useful, cannot handle the uncertainty introduced by imputation, which may result in inflated uncertainty measures and potential biases. To address these issues, the paper considers multiple imputation methods that create multiple "complete" datasets by filling in missing values multiple times, reflecting the uncertainty in the imputation process. The authors implemented these methods using the MICE and missForest packages in R and incorporated the idea of local linear forests (LLF) into the new imputation method to accommodate the smoothing properties of functional data. Using a longitudinal study in smoking cessation research as an example, the paper explores the relationship between diastolic blood pressure and relapse and validates the effectiveness of the methods through various simulation scenarios. The study also highlights the limitations of existing Functional Data Analysis (FDA) methods in handling complex models, especially for sparse functional data where current methods may not be applicable for estimating nonlinear models. The paper proposes improved imputation strategies, including binning and cautious initialization, to improve estimation for both linear and nonlinear models. In conclusion, the paper attempts to address the effective handling of sparse and irregularly sampled data in Functional Data Analysis to improve estimation accuracy for complex nonlinear models while reducing the impact of uncertainty.

Modern Multiple Imputation with Functional Data

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence

A Bayesian Functional Data Model for Surveys Collected under Informative Sampling with Application to Mortality Estimation using NHANES

Ultra-efficient MCMC for Bayesian longitudinal functional data analysis

Bayesian Analysis of Multidimensional Functional Data

Single and multiple index functional regression models with nonparametric link

Multiple imputation in functional regression with applications to EEG data in a depression study

Double robust estimation of functional outcomes with data missing at random

Robust Functional Regression with Discretely Sampled Predictors

Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation

Multiple Imputation of Hierarchical Nonlinear Time Series Data with an Application to School Enrollment Data

Functional Linear Mixed Models for Irregularly or Sparsely Sampled Data

Index Models for Sparsely Sampled Functional Data

Multiple Imputation Method for High-Dimensional Neuroimaging Data

Highly Irregular Functional Generalized Linear Regression with Electronic Health Records

Bayesian latent factor regression for multivariate functional data with variable selection

Infinite hidden markov models for multiple multivariate time series with missing data

Modeling Multivariate Mixed-Response Functional Data

Multiple imputation of missing covariates when using the Fine-Gray model

Scalar-on-function regression: Estimation and inference under complex survey designs

Functional Causal Inference with Time-to-Event Data