Causal Domain Adaptation with Copula Entropy based Conditional Independence Test

Jian Ma
DOI: https://doi.org/10.48550/arXiv.2202.13482
2022-02-28
Abstract:Domain Adaptation (DA) is a typical problem in machine learning that aims to transfer the model trained on source domain to target domain with different distribution. Causal DA is a special case of DA that solves the problem from the view of causality. It embeds the probabilistic relationships in multiple domains in a larger causal structure network of a system and tries to find the causal source (or intervention) on the system as the reason of distribution drifts of the system states across domains. In this sense, causal DA is transformed as a causal discovery problem that finds invariant representation across domains through the conditional independence between the state variables and observable state of the system given interventions. Testing conditional independence is the corner stone of causal discovery. Recently, a copula entropy based conditional independence test was proposed with a rigorous theory and a non-parametric estimation method. In this paper, we first present a mathemetical model for causal DA problem and then propose a method for causal DA that finds the invariant representation across domains with the copula entropy based conditional independence test. The effectiveness of the method is verified on two simulated data. The power of the proposed method is then demonstrated on two real-world data: adult census income data and gait characteristics data.
Machine Learning,Methodology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the Causal Domain Adaptation (Causal DA) problem in Domain Adaptation (DA). Specifically, it aims to solve the model transfer problem caused by data distribution differences between different domains from the perspective of causality. ### Problem Background In machine learning, domain adaptation is a typical problem, and its goal is to transfer a model trained in the source domain to the target domain, even if the data distributions of these two domains are different. Due to the existence of distribution differences, directly applying the model of the source domain to the target domain usually leads to performance degradation. Therefore, the goal of domain adaptation is to learn a model that can be generalized between different domains. ### Causal Domain Adaptation (Causal DA) Causal domain adaptation is a special case of domain adaptation, which solves problems from the perspective of causality. Causal DA considers multiple domains as part of a system, and the distribution changes between domains are caused by causal sources (or interventions) outside the system. Therefore, Causal DA can be transformed into a causal discovery problem, that is, finding invariant representations across domains through conditional independence tests. ### Specific Problem Description The paper proposes a conditional independence test method based on Copula entropy to solve the causal domain adaptation problem. Specifically, the author attempts to solve the following problems: 1. **Constructing a Mathematical Model**: Establish a mathematical model to describe the causal domain adaptation problem, which involves state variables, observable variables, and intervention variables. 2. **Finding Invariant Representations**: Find invariant representations across domains through conditional independence tests, and these representations are conditionally independent given the intervention. 3. **Verifying the Effectiveness of the Method**: Verify the effectiveness of the proposed method through simulated data and real - world data (such as adult income census data and gait feature data). ### Application Examples - **Adult Income Census Data**: Study the impact of gender on education and income inequality, and verify the hypothesis through conditional independence tests. - **Gait Feature Data**: Study the relationship between gait features in the TUG test and daily life scenarios and the TUG score, and find gait features that are stable in both scenarios. ### Method Advantages Compared with other domain adaptation methods, the advantage of causal domain adaptation is that it can explain the differences between domains and find the true invariant representations through causality, rather than simply looking for "invariant" correlations. ### Conclusion The paper proposes a conditional independence test method based on Copula entropy to solve the causal domain adaptation problem, and verifies the effectiveness of this method through simulation experiments and real data. The results show that this method can reasonably and meaningfully discover invariant causal relationships across domains. --- I hope the above summary can help you understand the problem that this paper attempts to solve and its solution. If you have more questions or need further detailed information, please feel free to ask!