NF DB (Nitrogen Fixation DataBase) - A Comprehensive Integrated Database for Robust ‘Omics Analysis of Diazotrophs

Madeline Bellanger,Jose L. Figueroa III,Lisa Tiemann,Maren L. Friesen,Richard Allen White III
DOI: https://doi.org/10.1101/2024.03.04.583350
2024-03-06
Abstract:Biological nitrogen fixation is a fundamental biogeochemical process that transforms that provides fixed biologically available nitrogen by diazotrophic microbes. Diazotrophs anaerobically fix nitrogen using the nitrogenase enzyme which has three different gene clusters: 1) molybdenum nitrogenase ( ) is the most abundant, followed by it’s alternatives 2) vanadium nitrogenase ( ), and 3) iron nitrogenase ( ). Multiple databases have been constructed as resources for diazotrophic ‘omics analysis; however, an integrated database based on whole genome references does not exist. Here, we present NF DB ( itrogen ation ata ase), a comprehensive integrated whole genome based database for diazotrophs, which includes all nitrogenases ( , , ) and nitrogenase-like enzymes (e.g., ) linked to ribosomal operons (16S-5.8S-23S). NF DB was computed using Hidden Markov Models (HMMs) against the entire whole genome based Genome Taxonomy Database (GTDB R214), providing searchable reference HMMs for all nitrogenase and nitrogenase-like genes, complete ribosomal operons, both GTDB and NCBI/RefSeq taxonomy, and an SQL database for querying matches. We compared NF DB to databases from Buckley, Zehr, Mise, and FunGene finding extensive evidence of , in addition to and . NF DB contains more than 4,000 verified sequences contained on 50 unique phyla of bacteria and archaea. NF DB offers the first comprehensive nitrogenase database available to researchers.
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the current lack of a genome - wide, fully integrated nitrogenase database, which limits researchers' ability to accurately analyze the phylogenetic and metabolic characteristics of diazotrophs. Specifically: 1. **Limitations of existing databases**: Most of the current nitrogenase databases are constructed based on amplicon sequences. These sequences are incomplete and cannot provide sufficient information to define the phylogeny of diazotrophs or infer their metabolic characteristics. In addition, existing databases such as FunGene, Buckley, Zehr, and Mise, although providing some data, have problems such as untimely updates and not including alternative nitrogenases. 2. **Research objectives**: To solve the above problems, this paper proposes and constructs a new comprehensive database - NFixDB (Nitrogen Fixation DataBase). This database is based on whole - genome references, covering all types of nitrogenases (nifDHK, vnfDHK, anfDHK) and their similar enzymes (such as nflDH), and is associated with ribosomal operons (16S - 5.8S - 23S). 3. **Expected impacts**: By creating NFixDB, researchers can more comprehensively understand the diversity, distribution, and potential activity of diazotrophs, thereby providing more accurate data support for fields such as agriculture, food safety, and bioenergy applications. In particular, NFixDB can help scientists better understand the importance of free - living nitrogen fixation (FLNF) in ecosystems and lay the foundation for future nitrogen fixation research. In summary, this paper aims to fill the gaps in existing nitrogenase databases and provide a genome - wide, fully integrated resource platform to promote in - depth research and application of diazotrophs.