PhalydDB: an Extensive Phage-Derived Lytic Protein Database for Targeted Antimicrobial Engineering Design and Bacterial Host Prediction
Hongquan Gou,Enhao Li,Yilun Xue,Yi Rong,Yihui Zhang,Cheng Chang,Wennan Guo,Shiyun Wang,Jingyang Tu,Chao Lv,Min Li,Jiewen Huang,Xiaokui Guo,Qingtian Li,YongZhang Zhu
DOI: https://doi.org/10.2139/ssrn.4170186
2022-01-01
Abstract:Background: In recent years, phages and phage-derived lytic protein products have been attracting increasing attention due to their better effectiveness against drug-resistant bacteria. Furthermore, with the development of Next Generation Sequencing (NGS) and metagenomic technology, a growing number of phage genome sequences are available for mining lytic proteins or lysin-associated proteins from massive genomic datasets. Methods: To develop the phage lytic protein database, 4,594, 16,453, and 108,967 phage nucleotide sequences from three separate sources were comprehensively evaluated. Each protein domain of the phage lytic protein was assessed for its relevance to its respective host. Based on the properties of phage lysin-associated domains, domain combinations were advised for antimicrobial engineering. A methodology for predicting phage hosts based on the conservatism of phage lytic protein sequence was systematically developed. Findings: The Phage Lysin-associated Domain and Bacterial Host Prediction Database (PhalydDB) comprised 130,014 phage genomes with 198,558 phage lytic proteins. Phages that targeted different types of bacterial hosts shared several phage lysin-associated domains (PLADs), and phages with different combinations of PLADs might provide a broad spectrum of antibacterial activity. Two members of the Glyco_hydro family, Glyco_hydro_108 , Glyco_hydro_108+PG_binding_3 mainly associated with Gram-negative hosts, whereas Glyco_hydro_25+PG_binding_1 distinctly associated with Gram-positive hosts. In addition, Peptidase_M23+Amidase_2+PG_binding_1 , Phage_holing_7_1 , Prim_Pol+AAA _25 targeted Mycobacterium exclusively, as did HTH_3+Peptidase_S24+Phage_holin_3_1 for Pseudomonas, and CHAP+Amidase_3+SH3_5, Phage_Min_Tail+Peptidase_M23 for Staphylococcus. Our unique methods for predicting phage hosts performed optimally at 70% identity, resulting in accurate predictions. Interpretation Referencing PhalydDB's large-scale PLADs enable researchers to design innovative and diverse antibacterial-spectrum phage lytic-associated antibacterial agents. As lytic proteins are conserved among hosts, PhalydDB can also be used to predict the hosts of phages. With our unique method of predicting phage hosts, researchers can combine PLADs with different targets to build enzybiotics. Funding: This study was supported by the National Natural Science Foundation of China (Grant No. 32170141).