Nonparametric causal discovery with applications to cancer bioinformatics

Jean Pierre Gomez
2024-01-08
Abstract:Many natural phenomena are intrinsically causal. The discovery of the cause-effect relationships implicit in these processes can help us to understand and describe them more effectively, which boils down to causal discovery about the data and variables that describe them. However, causal discovery is not an easy task. Current methods for this are extremely complex and costly, and their usefulness is strongly compromised in contexts with large amounts of data or where the nature of the variables involved is unknown. As an alternative, this paper presents an original methodology for causal discovery, built on essential aspects of the main theories of causality, in particular probabilistic causality, with many meeting points with the inferential approach of regularity theories and others. Based on this methodology, a non-parametric algorithm is developed for the discovery of causal relationships between binary variables associated to data sets, and the modeling in graphs of the causal networks they describe. This algorithm is applied to gene expression data sets in normal and cancerous prostate tissues, with the aim of discovering cause-effect relationships between gene dysregulations leading to carcinogenesis. The gene characterizations constructed from the causal relationships discovered are compared with another study based on principal component analysis (PCA) on the same data, with satisfactory results.
Quantitative Methods
What problem does this paper attempt to address?
The paper primarily aims to address two core issues: 1. **Proposing a new non-parametric causal discovery method**: In response to the complexity and cost issues of current causal discovery algorithms when dealing with large datasets or unknown variable properties, the paper proposes an original non-parametric causal discovery method. This method focuses on detecting sufficient causal relationships between variables in the dataset and is capable of constructing graphical models of these causal relationships. 2. **Applying this method to identify gene regulatory networks in cancer**: Specifically, the research focuses on prostate cancer, aiming to discover the causal relationships between genes that lead to and regulate the occurrence of cancer by analyzing gene expression data. The goal is to identify genes that can serve as cancer indicators (i.e., their variations can indicate the presence of cancer) and genes that may become therapeutic targets (i.e., their expression inhibition or replacement may help in disease regulation and potential cure). In short, this paper proposes a new causal discovery framework aimed at detecting causal links between variables in datasets, with a particular application in identifying gene regulatory networks related to prostate cancer.