UniOP: a universal operon prediction for high-throughput prokaryotic (meta-)genomic data using intergenic distance

Hong Su,Johannes Soeding,Ruo Shi Zhang
DOI: https://doi.org/10.1101/2024.11.11.623000
2024-11-11
Abstract:The study of the deluge of metagenomic and genomic sequences is challenging due to the severe lack of function information. Predicting operons, groups of functionally related genes in prokaryotic genomes, is critical for bridging this gap. However, existing methods for operon prediction heavily rely on experimental data, functional annotations, or extensive characterization of homologous genes, making it difficult to accurately predict operons in newly sequenced or poorly characterized genomes. Here, we introduce UniOP, an unsupervised approach that uses a statistical model to predict operons from intergenic distances directly derived from the target genomic sequence. UniOP not only outperforms alternative approaches on ten complete genomes but also shows superior results on 3269 metagenome-assembled genomes across 13 bacterial and 2 archaeal phyla. Furthermore, we explored enhancing UniOP by incorporating the conservation of gene neighborhood and strandedness in respective genomes and examined the influence of Pfam annotations and motif searching on its performance.
Bioinformatics
What problem does this paper attempt to address?