Abstract:Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at <a class="link-external link-https" href="https://github.com/lockEF/MultiwayImputation" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of multiple imputation of missing values in multi - dimensional data (tensors), especially its application in the biomedical field. Specifically, the author focuses on how to not only provide point estimates but also accurately capture uncertainty when dealing with missing data. The following are the main problems and objectives of this study: 1. **Limitations of existing methods**: - Most of the existing tensor completion methods only provide single - point estimates of missing values without considering uncertainty. - Such a single - point estimate will lead to underestimation of uncertainty in subsequent analysis, thus affecting the accuracy of inference. 2. **Research objectives**: - Propose a multiple imputation method based on the Bayesian framework (BAMITA) to generate multiple simulated values to reflect the uncertainty of missing items. - Sample through the posterior predictive distribution to ensure that uncertainty can be correctly propagated into subsequent analysis. - Use CANDECOMP/PARAFAC (CP) decomposition and introduce effective conjugate priors to improve computational efficiency and applicability. - Adopt a separable covariance structure for error terms to better capture correlations in different modes. 3. **Specific application scenarios**: - In long - term microbiome studies, data at some time points are completely missing (fiber deficiency). For example, in longitudinal microbiome studies, microbial abundance data of some subjects at some time points are missing. - This method can be used to accurately capture the uncertainty of the entire microbial community and infer the trend of species diversity in the population. ### Main contributions - **Multiple imputation**: A new Bayesian multiple imputation method is proposed, which can accurately capture uncertainty while imputing missing values. - **Uncertainty propagation**: Ensure the correct propagation of uncertainty in subsequent analysis, improving the effectiveness and reliability of inference. - **Efficient algorithm**: Use an efficient MCMC sampling algorithm, combined with CP decomposition and a separable covariance structure, making the model suitable for large - scale and high - dimensional data. - **Empirical verification**: Through simulation experiments and actual microbiome data analysis, the superior performance of this method in imputation accuracy and uncertainty calibration is verified. In conclusion, by proposing the BAMITA method, this paper solves the key problem that existing tensor completion methods cannot effectively capture uncertainty, providing a more reliable method for data analysis in the biomedical field.

BAMITA: Bayesian Multiple Imputation for Tensor Arrays

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Multiple Imputation with Multivariate Imputation by Chained Equation (mice) Package

Multiple Imputation Method for High-Dimensional Neuroimaging Data

A semi-parametric multiple imputation method for high-sparse, high-dimensional, compositional data

A Bayesian two-step multiple imputation approach based on mixed models for the missing in EMA data

Bayesian Nonparametric Models for Multiway Data Analysis

Multi-Omics Regulatory Network Inference in the Presence of Missing Data

Missing Values in Longitudinal Proteome Dynamics Studies: Making a Case for Data Multiple Imputation

Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data

Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization

Empirical Bayes Linked Matrix Decomposition

Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis

BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition

Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data

Tensor Time Series Imputation through Tensor Factor Modelling

Bayesian Robust Tensor Factorization for Incomplete Multiway Data.

Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research

Multiple Imputation by Ordered Monotone Blocks with Application to the Anthrax Vaccine Research Program