Stable Prediction with Leveraging Seed Variable
Kun Kuang,Haotian Wang,Yue Liu,Ruoxuan Xiong,Runze Wu,Weiming Lu,Fei Wu,Peng Cui,Bo Li,Yue Ting Zhuang
DOI: https://doi.org/10.1109/tkde.2022.3169333
IF: 9.235
2022-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In this paper, we focus on the problem of stable prediction across unknown test data, where the test distribution might be different from the training one and is always agnostic when model training. In such a case, previous machine learning methods might exploit subtly spurious correlations induced by non-causal variables in training data for prediction. Those spurious correlations can vary across datasets, leading to instability of prediction across unknown test data. To address this problem, we propose an algorithm based on conditional independence tests to screen out non-causal features and reduce spurious correlations by leveraging a seed variable. We show, both theoretically and with empirical experiments, that our algorithm can precisely screen out the isolated non-causal variables, which have no causal relationship with other variables, and remove the spurious correlations induced by them, increasing the stability of prediction across unknown test data. Extensive experiments on both synthetic and real-world datasets demonstrate that our algorithm outperforms state-of-the-art methods for stable prediction across unknown test data.
computer science, information systems, artificial intelligence,engineering, electrical & electronic