Abstract:BackgroundKnowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential’ or 'non-essential’ categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions.ResultsWe evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted.ConclusionsA 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth.

Hidden Markov Model Variants and their Application

Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle

A Hidden Markov Model for identifying essential and growth-defect regions in bacterial genomes from transposon insertion sequencing data

Coupling hidden Markov models for the discovery of Cis-regulatory modules in multiple species

Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed)

Gene Prediction Based On A Generalized Hidden Markov Model And Some Statistical Models Of Related States: A Review

Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome

Higher-order Markov models for metagenomic sequence classification

Mining gene expression data using a novel approach based on hidden Markov models.

Topological Hidden Markov Models

An Introduction to Hidden Markov Models

What is a hidden Markov model?

A comparative genomic method for computational identification of prokaryotic translation initiation sites

Uncovering ecological state dynamics with hidden Markov models

Fast and accurate haplotype inference with hidden markov model

Hidden Markov model speed heuristic and iterative HMM search procedure

Modeling gene content across a phylogeny to determine when genes become associated

Application of Hidden Markov Model in the Recognition of Splicing Sites

An introduction to infinite HMMs for single molecule data analysis

Statistical Inference in Hidden Markov Models using $k$-segment Constraints

A hidden Markov support vector machine framework incorporating profile geometry learning for identifying microbial RNA in tiling array data