Machine learning classification of archaea and bacteria identifies novel predictive genomic features

Tania Bobbo,Filippo Biscarini,Sachithra K. Yaddehige,Leonardo Alberghini,Davide Rigoni,Nicoletta Bianchi,Cristian Taccioli
DOI: https://doi.org/10.1186/s12864-024-10832-y
IF: 4.547
2024-10-16
BMC Genomics
Abstract:Archaea and Bacteria are distinct domains of life that are adapted to a variety of ecological niches. Several genome-based methods have been developed for their accurate classification, yet many aspects of the specific genomic features that determine these differences are not fully understood. In this study, we used publicly available whole-genome sequences from bacteria ( ) and archaea ( ). From these, a set of genomic features (nucleotide frequencies and proportions, coding sequences (CDS), non-coding, ribosomal and transfer RNA genes (ncRNA, rRNA, tRNA), Chargaff's, topological entropy and Shannon's entropy scores) was extracted and used as input data to develop machine learning models for the classification of archaea and bacteria.
genetics & heredity,biotechnology & applied microbiology
What problem does this paper attempt to address?