DeepResBat: deep residual batch harmonization accounting for covariate distribution differences

Lijun An,Chen Zhang,Naren Wulan,Shaoshi Zhang,Pansheng Chen,Fang Ji,Kwun Kei Ng,Christopher Chen,Juan Helen Zhou,B.T. Thomas Yeo,Alzheimer’s Disease Neuroimaging Initiative,Australian Imaging Biomarkers and Lifestyle Study of Aging
DOI: https://doi.org/10.1101/2024.01.18.574145
2024-01-20
Abstract:Pooling MRI data from multiple datasets requires harmonization to reduce undesired inter-site variabilities, while preserving effects of biological variables (or covariates). The popular harmonization approach ComBat uses a mixed effect regression framework that explicitly accounts for covariate distribution differences across datasets. There is also significant interest in developing harmonization approaches based on deep neural networks (DNNs), such as conditional variational autoencoder (cVAE). However, current DNN approaches do not explicitly account for covariate distribution differences across datasets. Here, we provide mathematical results, suggesting that not accounting for covariates can lead to suboptimal harmonization outcomes. We propose two DNN-based harmonization approaches that explicitly account for covariate distribution differences across datasets: covariate VAE (coVAE) and DeepResBat. The coVAE approach is a natural extension of cVAE by concatenating covariates and site information with site– and covariate-invariant latent representations. DeepResBat adopts a residual framework inspired by ComBat. DeepResBat first removes the effects of covariates with nonlinear regression trees, followed by eliminating site differences with cVAE. Finally, covariate effects are added back to the harmonized residuals. Using three datasets from three different continents with a total of 2787 participants and 10085 anatomical T1 scans, we find that DeepResBat and coVAE outperformed ComBat, CovBat and cVAE in terms of removing dataset differences, while enhancing biological effects of interest. However, coVAE hallucinates spurious associations between anatomical MRI and covariates even when no association exists. Therefore, future studies proposing DNN-based harmonization approaches should be aware of this false positive pitfall. Overall, our results suggest that DeepResBat is an effective deep learning alternative to ComBat.
Neuroscience
What problem does this paper attempt to address?