Differentiable and Scalable Generative Adversarial Models for Data Imputation

Yangyang Wu,Jun Wang,Xiaoye Miao,Wenjia Wang,Jianwei Yin
DOI: https://doi.org/10.1109/tkde.2023.3293129
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data. SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Moreover, SCIS can also accelerate the autoencoder based imputation models. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 6.23x. Using around 1.27% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in much shorter computation time.
computer science, information systems, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?