Abstract:Abstract Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes’ biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets’ discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.

HELP: A computational framework for labelling and predicting human common and context-specific essential genes

HELP: A computational framework for labelling and predicting human common and context-specific essential genes

New insights on human essential genes based on integrated multi-omics analysis

Machine learning approach to gene essentiality prediction: a review

New insights on human essential genes based on integrated analysis

New Insights on Human Essential Genes Based on Integrated Analysis and the Construction of the HEGIAP Web-Based Platform

Towards Prediction and Prioritization of Disease Genes by the Modularity of Human Phenome-Genome Assembled Network.

In silico network topology-based prediction of gene essentiality

Recent advances in the characterization of essential genes and development of a database of essential genes

Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways

A single-gene-based AI model to identify core and context-specific essential genes by biological interpretation from pooled genome-wide CRISPR and omics data

CEGSO: Boosting Essential Proteins Prediction by Integrating Protein Complex, Gene Expression, Gene Ontology, Subcellular Localization and Orthology Information

EPGAT: Gene Essentiality Prediction With Graph Attention Networks

DeEPsnap: human essential gene prediction by integrating multi-omics data

OGEE v3: Online GEne Essentiality database with increased coverage of organisms and human cell lines

Prediction of Human Disease Genes by Human-Mouse Conserved Coexpression Analysis

OGEE V2: an Update of the Online Gene Essentiality Database with Special Focus on Differentially Essential Genes in Human Cancer Cell Lines

ECDEP: identifying essential proteins based on evolutionary community discovery and subcellular localization

Essential genes identification model based on sequence feature map and graph convolutional neural network

Genomic Identification and Functional Analysis of Essential Genes in Caenorhabditis Elegans

Quantifying Gene Essentiality Based on the Context of Cellular Components.