Pathway analysis through mutual information

Gustavo S Jeuken,Lukas Käll
DOI: https://doi.org/10.1093/bioinformatics/btad776
IF: 5.8
2024-01-01
Bioinformatics
Abstract:Abstract Motivation In pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships. Results Here, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores. Availability and implementation Source code is available at https://github.com/statisticalbiotechnology/mipath, or through Python Package Index as “mipathway.”
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The paper aims to address a problem in biological pathway analysis, namely how to better establish the connection between biological pathway activity and phenotypic differences. The authors propose a new method called MIPath (Mutual Information Pathway Analysis), which is based on information theory principles. This method constructs data point graphs for each pathway and uses Mutual Information (MI) to assess the association between these graph structures and sample phenotypes, while adjusting for the influence of random chance. The main features of MIPath include: 1. **No reliance on linear relationship assumptions**: Unlike many existing methods, MIPath does not assume that the relationships between pathway components or between pathway activity and phenotypes are linear. 2. **Suitable for single-cell data**: This method is particularly well-suited for handling single-cell data because it employs a nearest-neighbor approach to avoid comparisons at the individual gene level, making the method more robust and less susceptible to missing values. 3. **Few assumptions**: By using mutual information as the fundamental metric, this method makes fewer assumptions about how gene products interact within pathways and how these interactions lead to phenotypic changes. 4. **Fast computation**: Even for large-scale datasets, MIPath can complete the analysis in a relatively short time. The main steps of the method mentioned in the paper are as follows: - Construct data point graphs for each pathway using a nearest-neighbor algorithm. - Use the Leiden algorithm to detect modules, identifying groups of data points with similar pathway activity. - Calculate adjusted mutual information scores to quantify the degree of association between pathway states and sample-specific variables (such as phenotypic annotations). Through experimental validation on multiple single-cell datasets, MIPath demonstrated good performance and excelled in identifying target pathways compared to other existing pathway analysis methods. Additionally, the method proved the reproducibility and sensitivity of its results.