A single-gene-based AI model to identify core and context-specific essential genes by biological interpretation from pooled genome-wide CRISPR and omics data
Chih-Yuan Chou,Jung-Yu Lee,Chia-Hwa Lee,Jinn-Moon Yang
DOI: https://doi.org/10.1101/2024.11.03.621717
2024-11-03
Abstract:Genome-wide Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a powerful tool for screening essential genes (EGs) and studying gene function by systematically knocking out individual genes. EGs are essential for the survival of organisms and can be divided into core EGs (CEGs) and context-specific EGs, which are crucial for the development of drugs. Although a variety of CEGs have been identified using knockout technology, the concordance among these CEG sets is extremely low; there is a lack of studies on the EG mechanisms. Therefore, developing systematic methods to identify CEGs and reveal corresponding mechanisms are important biological issues. To address these issues, we propose a comprehensive ensembled-based model utilizing gene community-regulated pathways to decipher pan-cancer CEGs and context-specific EGs across 29 cancer types and provide insights into their regulatory abilities for each pathway. The project aims include developing a model with Systematic Identification of Essential Gene (SIEG) scores for CEGs and Context-Specific Enrichment (CSE) scores for context-specific EGs. Subsequently, we assess the regulated pathways and mechanisms of these identified EGs by integrating diverse data sources such as genome-wide CRISPR/Cas9 knockout screens, multiple omics data, KEGG pathways, and Gene Ontology. Ultimately, we aim to establish a user-friendly web service. In our preliminary results, we gathered 1,845 genome-wide CRISPR datasets and various omics datasets, including 8,941 clinical samples and 1,346 cell lines, leading to the identification of 3,213 CEGs. By employing the SIEG score, we found that 1,178 of these pan-cancer CEGs overlapped with previously defined CEGs, reflecting a 60% similarity rate. These 3,213 CEGs play crucial roles in regulating pathways associated with cell viability, expansion, and proliferation. Additionally, they exhibit characteristics typical of CEGs, showing less favorable as therapeutic targets and centralizing within protein-protein interaction networks. Moreover, we delineated six pathway signatures of pan-cancer CEGs, encompassing transcription, translation, protein folding, replication and repair, cell growth and death, as well as energy. We anticipate that these signatures will contribute to the future redefinition of CEGs.
Bioinformatics