PLS-based gene subset augmentation and tumor-specific gene identification
Wenjie You,Zijiang Yang,Guoli Ji
DOI: https://doi.org/10.1016/j.compbiomed.2024.108434
IF: 7.7
2024-04-19
Computers in Biology and Medicine
Abstract:In the study of tumor disease pathogenesis, the identification of genes specifically expressed in disease states is pivotal, yet challenges arise from high-dimensional datasets with limited samples. Conventional gene (feature) selection methods often fall short of capturing the complexity of gene-phenotype and gene-gene interactions, necessitating a more robust analysis method. To address these challenges, a gene subset augmentation strategy is proposed in this paper. Our approach introduces diverse perturbation mechanisms to generate distinct gene subsets. The partial least squares-based multiple gene measurement algorithm considers gene-phenotype and gene-gene correlations, identifying differentially expressed genes, including those with weak signals. The constructed gene networks derived from the augmented subsets unveil regulatory patterns, enabling association analysis to explore gene associations comprehensively. Our algorithm excels in identifying small-sized gene subsets with strong discriminative power, surpassing traditional methods that yield a single gene subset. Unlike conventional approaches, our algorithm reveals a spectrum of different gene subsets and their weakly differentially expressed genes. This nuanced perspective aids in unraveling the molecular characteristics and specific expression patterns of tumor genes. The versatility of our approach not only contributes to the advancement of tumor-specific gene identification but also holds promise for addressing challenges in various fields characterized by high-dimensional datasets and limited samples. The Python implementation is available at http://github.com/wenjieyou/PLSGSA .
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology