Cross-validating causal discovery via Leave-One-Variable-Out

Daniela Schkoda,Philipp Faller,Patrick Blöbaum,Dominik Janzing
2024-11-08
Abstract:We propose a new approach to falsify causal discovery algorithms without ground truth, which is based on testing the causal model on a pair of variables that has been dropped when learning the causal model. To this end, we use the "Leave-One-Variable-Out (LOVO)" prediction where $Y$ is inferred from $X$ without any joint observations of $X$ and $Y$, given only training data from $X,Z_1,\dots,Z_k$ and from $Z_1,\dots,Z_k,Y$. We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between $X$ and $Y$, enabling this type of prediction. The prediction error can then be estimated since the joint distribution $P(X, Y)$ is assumed to be available, and $X$ and $Y$ have only been omitted for the purpose of falsification. After presenting this graphical method, which is applicable to general causal discovery algorithms, we illustrate how to construct a LOVO predictor tailored towards algorithms relying on specific a priori assumptions, such as linear additive noise models. Simulations indicate that the LOVO prediction error is indeed correlated with the accuracy of the causal outputs, affirming the method's effectiveness.
Machine Learning,Methodology
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to verify causal discovery algorithms in the absence of ground truth. Specifically, the author proposes a new method to evaluate the accuracy of causal discovery algorithms without relying on known true causal relationships. #### Background and Motivation Causal discovery is a method of inferring causal relationships from observational data and has received extensive attention in recent years. However, in practical applications, it is difficult for researchers to determine which causal discovery method is most suitable for their application scenarios or whether there is a method that can reasonably solve the problem. This is because most evaluations are based on simulated data, and these simulated data may not fully reflect the complexity in the real world. In addition, even if there are some experimental data sets that can be used to evaluate causal models (such as gene knockout experiments), the number of these data sets is limited and the acquisition cost is high. Therefore, there is an urgent need for a method that can evaluate causal discovery algorithms without relying on ground truth. #### Proposed Method: Leave - One - Variable - Out (LOVO) Cross - Validation The author proposes a cross - validation method named "Leave - One - Variable - Out (LOVO)". The core idea of this method is to verify the causal model by excluding one variable. The specific steps are as follows: 1. **Select a pair of variables**: Select a pair of variables \((X, Y)\) from all variables \(W\). 2. **Train the causal model**: Train the causal discovery algorithm on subsets \((X, Z)\) and \((Y, Z)\) respectively, where \(Z\) is all other variables. 3. **Construct the predictor**: According to the trained causal models \(G_X\) and \(G_Y\), construct a LOVO predictor to predict \(P(Y|X)\) or the conditional expectation \(E[Y|X = x]\). 4. **Estimate the prediction error**: Compare the results of the LOVO predictor with the estimates obtained from the joint distribution \(P(X, Y)\) and calculate the prediction error. 5. **Repeat the verification**: Repeat the above process for all possible pairs of variables \((X, Y)\) to obtain the overall LOVO cross - validation error. Through this method, researchers can evaluate the performance of causal discovery algorithms without relying on ground truth, and can select the optimal causal discovery method by comparing the LOVO errors of different algorithms. #### Main Contributions 1. **No ground truth required**: The LOVO method can evaluate the accuracy of causal discovery algorithms without ground truth. 2. **Applicable to general causal discovery algorithms**: This method is not only applicable to specific causal discovery algorithms, but can also be generalized to general causal discovery frameworks. 3. **Provide a baseline**: Introduce the MaxEnt baseline predictor as a reference to help determine whether causal information actually improves prediction performance. 4. **Theoretical and empirical analysis**: Verify the effectiveness of the LOVO method through theoretical derivation and simulation experiments. In summary, this paper proposes an innovative method to evaluate the performance of causal discovery algorithms, providing new tools and ideas for research in the field of causal inference.