Prophage-DB: A comprehensive database to explore diversity, distribution, and ecology of prophages

Etan Dieppa-Colon,Cody Martin,Karthik Anantharaman
DOI: https://doi.org/10.1101/2024.07.11.603044
2024-07-16
Abstract:Background. Viruses that infect prokaryotes (phages) constitute the most abundant group of biological agents, playing pivotal roles in microbial systems. They are known to impact microbial community dynamics, microbial ecology, and evolution. Efforts to document the diversity, host range, infection dynamics, and effects of bacteriophage infection on host cell metabolism are extremely underexplored. Phages are classified as virulent or temperate based on their life cycles. Temperate phages adopt the lysogenic mode of infection, where the genome integrates into the host cell genome forming a prophage. Prophages enable viral genome replication without host cell lysis, and often contribute novel and beneficial traits to the host genome. Current phage research predominantly focuses on lytic phages, leaving a significant gap in knowledge regarding prophages, including their biology, diversity, and ecological roles. Results. Here we develop and describe Prophage-DB, a database of prophages, their proteins, and associated metadata that will serve as a resource for viral genomics and microbial ecology. To create the database, we identified and characterized prophages from genomes in three of the largest publicly available databases. We applied several state-of-the-art tools in our pipeline to annotate these viruses, cluster and taxonomically classify them, and detect their respective auxiliary metabolic genes. In total, we identify and characterize over 350,000 prophages and 35,000 auxiliary metabolic genes. Our prophage database is highly representative based on statistical results and contains prophages from a diverse set of archaeal and bacterial hosts which show a wide environmental distribution. Conclusion. Prophages are particularly overlooked in viral ecology and merit increased attention due to their vital implications for microbiomes and their hosts. Here, we created Prophage-DB to advance our comprehension of prophages in microbiomes through a comprehensive characterization of prophages in publicly available genomes. We propose that Prophage-DB will serve as a valuable resource for advancing phage research, offering insights into viral taxonomy, host relationships, auxiliary metabolic genes, and environmental distribution.
Bioinformatics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the current lack of research on prophages, especially in terms of diversity, distribution, and ecology. Specifically: 1. **Insufficient research on prophages**: Although phages (especially lytic phages) play a crucial role in microbial systems, relatively little research has been done on temperate phages and their prophage forms. Prophages refer to the state in which temperate phages integrate into the host genome during their lysogenic infection process. 2. **Limited understanding of biology, diversity, and ecological roles**: Currently, the understanding of the biological characteristics, diversity, and their roles in ecosystems of prophages is limited. This includes how prophages affect host - cell metabolism, microbial community dynamics, and their potential roles in global biogeochemical cycles. 3. **Lack of standardized resources**: Existing databases and tools mainly focus on lytic phages, and the comprehensive description and classification of prophages are still insufficient. Therefore, a comprehensive database is needed to promote the research on prophage diversity and ecology. To solve these problems, the authors developed Prophage - DB, a comprehensive database containing prophages, their proteins, and related metadata. Through this database, researchers can better understand the role of prophages in microbial communities and provide valuable resources for future virology research. ### Main objectives - **Create a standardized prophage database**: Prophage - DB aims to be an important resource for viral genomics and microbial ecology research. - **Improve the understanding of prophage diversity**: By identifying and characterizing a large number of prophages, reveal their distribution in different hosts. - **Explore the ecological roles of prophages**: Analyze the distribution of prophages in different environments and understand their impact on the structure and function of microbial communities. - **Discover auxiliary metabolic genes (AMGs)**: Identify AMGs carried in prophages and explore how these genes affect host metabolism and ecosystem processes. In summary, this paper aims to fill the gaps in prophage research and provide new perspectives and tools for in - depth understanding of the interactions between phages and hosts and their roles in ecosystems.