Conditional Independence Test Based on Residual Similarity
Hao Zhang†,Yewei Xia†,Kun Zhang,Shuigeng Zhou*,Jihong Guan
DOI: https://doi.org/10.1145/3593810
IF: 4.157
2023-04-25
ACM Transactions on Knowledge Discovery from Data
Abstract:Recently, many regression-based conditional independence (CI) test methods have been proposed to solve the problem of causal discovery. These methods provide alternatives to test CI of x , y given Z by first removing the information of the controlling set Z from x and y , and then testing the independence between the two residuals R x , Z and R y , Z . When the residuals are linearly uncorrelated, the independence test between them is nontrivial. With the ability to calculate inner product in high-dimensional space, kernel-based methods are usually used to achieve this goal, but they are considerably time-consuming. In this paper, we test the independence between two linear combinations under linear structural equation model. We show that the dependence between the two residuals can be captured by the difference between the similarity of R x , Z and R y , Z and that of R x , Z and R r ( R r is an independent copy of R y , Z ) in high-dimensional space. With this result, we provide a new way to test CI based on the similarity between residuals, which is called SCIT — the abbreviation of Similarity-based CI Testing. Furthermore, we develop two versions of the proposal, called Kernel-SCIT and Neural-SCIT respectively. Kernel-SCIT calculates the similarity by using kernel functions, while Neural-SCIT approximates the upper bound of the similarity by using deep neural networks. In both algorithms, random permutation tests are performed to control Type I error rate. The proposed tests are evaluated on (conditional) independence test and causal discovery with both synthetic and real datasets. Experimental results show that Kernel-SCIT is simpler yet more efficient and effective than the typical existing kernel-based methods HSIC and KCIT in the cases of small sample size, and Neural-SCIT can significantly boost the performance of CI testing when sufficient samples are available. The source code is available at https://github.com/xyw5vplus1/SCIT.
computer science, information systems, software engineering