Abstract:Abstract Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes’ biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets’ discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.

Computational Methods for the Prediction of Microbial Essential Genes

Research on the Computational Prediction of Essential Genes

A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes

Exploring The Optimal Strategy To Predict Essential Genes In Microbes

A New Computational Strategy for Predicting Essential Genes

Three computational tools for predicting bacterial essential genes.

Comprehensive Review of the Identification of Essential Genes Using Computational Methods: Focusing on Feature Implementation and Assessment

Robust Predictions of Specialized Metabolism Genes Through Machine Learning

Machine learning approach to gene essentiality prediction: a review

Identifying Bacterial Essential Genes Based on a Feature-Integrated Method

An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms

Predicting Bacterial Essential Genes Using Only Sequence Composition Information.

A Survey on Computational Methods for Essential Proteins and Genes Prediction

Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways

Computational analysis of genes with lethal knockout phenotype and prediction of essential genes in archaea

Network-based Methods for Predicting Essential Genes or Proteins: a Survey

In silico network topology-based prediction of gene essentiality

Training Set Selection for the Prediction of Essential Genes

Gene Essentiality Prediction Based on Fractal Features and Machine Learning

Prediction of Essential Proteins Based on Gene Expression Programming

Computational Approaches to Predicting Essential Proteins: A Survey